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Introduction 

This  grant  was  originally  issued  for  the  investigation  of  state-of-the-art  technology 
component-ware  for  distributed  databases  utilizing  CORBA,  as  well  as  the  development 
of  a  fuzzy  spatial  data  model  to  provide  a  way  of  handling  imprecision  and  uncertainty  in 
spatial  data.  Specifically,  the  grant  work  was  to  augment  ongoing  development  of  NRL’s 
Geospatial  Information  Database  (GIDB)  in  the  realm  of  Object  Request  Brokers  (ORBs) 
and  the  GemStone  Object-Oriented  Database  Management  System  (OODBMS). 

Soon  after  the  work  was  initiated,  however,  NRL  began  to  investigate  alternate 
implementation  architectures  for  the  GIDB  relative  to  its  future  stability  and  viability. 
Major  re-design  decisions  were  made,  including:  (1)  a  move  from  the  Smalltalk 
programming  language  to  Java,  (2)  replacement  of  GemStone  OODBMS  with  Ozone 
OODBMS  (Java-based),  and  (3)  replacement  of  CORBA-based  architecture  with  Java 
Remote  Method  Invocation  (RMI)/applet  technology.  These  decisions  have  since  proven 
to  be  invaluable  to  the  development  and  marketability  of  the  GIDB;  however,  because 
this  new  direction  made  much  of  the  proposed  grant  work  in  CORBA  and  related 
implementation  issues  unnecessary,  and  in  consultation  with  the  NRL  principal  scientist, 
the  concentration  of  work  was  redirected  to  the  areas  of  fuzzy  spatial  data  modeling  and 
conflation. 

After  the  original  three  years  were  completed,  a  no-cost  extension  was  requested.  This 
extension  funded  a  Ph.D.  scientific  computing  student  who  graduated  August  2002.  A 
copy  of  the  student’s  dissertation  on  conflation  is  included  with  this  report,  along  with  all 
other  grant-related  publications. 


Summary  of  Findings 

Work  for  this  grant  enabled  the  advancement  of  the  state  of  research  for  areas  of  spatial 
data  modeling,  uncertainty  and  conflation.  Special  journal  issues  co-edited  by  the  grant 
recipient  on  “Spatial  Data  Management,”  “Uncertainty  in  Geographic  Information 
Systems  and  Spatial  Data”  and  “Distributed  Object-Oriented  Systems”  resulted  in  newly 
published  works  from  some  of  the  foremost  researchers  in  the  area,  including  Michael 
Goodchild,  Peter  Fisher,  Hans  Guesgen,  Douglas  Schmidt  and  others. 

Specific  published  work  supported  by  this  grant  included  a  range  of  topics.  Some  of 
these  are:  fuzzy  spatial  relationship  refinements,  spatial  query  interfaces,  uncertainty  in 
distributed  spatial  information  systems,  spatial  relationship  querying,  distributed  spatial 
object-based  systems,  spatial  indexing,  distributed  conflation  model  and  issues,  and  an 
integrated  image  change  detection/conflation  algorithm. 

All  publications  listed  are  included  in  this  report  package,  and  the  reader  is  referred  to  the 
actual  publications  for  details  of  the  topics  listed  above. 
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Results 

This  grant  has  resulted  in  basic  research  into  fuzzy  spatial  data  models  and  conflation  of 
spatial  data.  It  has  resulted  in  the  publication  of  eight  conference  papers,  three  book 
chapters,  four  journal  articles  and  three  edited  journal  issues.  It  has  also  funded  one 
Scientific  Computing  Ph.D.  student  (graduated  August  2002).  Grant  funding  also 
enabled  the  USM  PI  to  perform  the  following  service  activities  to  promote  research  and 
knowledge  in  relevant  subject  areas: 

❖  Co-organized  and  chaired  panel  session,  "Can  Statistical  and  Fuzzy  Sets  Approaches 
Complement  Each  Other  in  Dealing  with  the  Problem  of  Uncertainty  in  Spatial  Data?"  at 
the  International  Symposium  on  Spatial  Data  Quality  '99,  Hong  Kong  (1999). 

❖  Organizing  co-chair  and  chair  of  four  special  conference  sessions: 

"Uncertainty  Management  in  Spatial  Data"  sessions  held  at: 

Information  Processing  and  the  Management  of  Uncertainty  (IPMU  ’98),  Paris, 
France (1998),  and 

International  Conference  on  Fuzzy  Systems  (FUZZ-IEEE  ’98),  Anchorage,  AK 
(1998). 

❖  "Distributed  Object-Oriented  Systems"--two  sessions 

1998  International  Conference  on  Parallel  and  Distributed  Computing  and  Systems 
(PDCS '98),  Las  Vegas,  Nevada  (1998). 

❖  "Spatial  Data  Computation"-three  sessions 

First  Southern  Symposium  on  Computing  (SSC  ’98),  Hattiesburg,  MS  (1998). 
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The  Geographic  Information  System  (GIS)  is  an  integrated  technology  that  incor¬ 
porates  concepts  from  computer  graphics,  spatial  modeling,  and  database  management. 
The  distributed  intelligent  mobile  agent  technique,  which  successfully  incorporated  more 
powerful  technology,  is  becoming  an  important  issue  in  Geographical  Information  Sys¬ 
tems.  Within  the  distributed  environment,  the  compatibilities  and  consistencies  are 
mainly  concerned  issues.  The  “conflation”  is  an  important  and  challenging  technique 
to  handle  these  issues. 

Generally,  conflation  means  to  combine  information  from  different  sources  and  then 
to  produce  better  information.  Up  to  now,  the  conflation  consideration  has  become 
much  broader  in  GIS  research  fields.  Many  efforts  have  been  made.  However,  conflation 
is  still  a  challenging  research  field  due  to  the  complexity  of  real  applications.  Seen  from 
the  existing  conflation  paradigms,  the  conflation  algorithms  have  been  ad  hoc,  designed 
for  specific  purposes.  The  focus  of  the  dissertation  is  placed  mainly  on  processing  the 
vector-based  conflation  problems. 

Considered  the  vulnerability  to  deal  with  time  in  existing  conflation  algorithms, 
the  endeavor  of  the  dissertation  is  to  explore  the  ways  in  which  conflation  capabilities 
can  be  augmented  with  the  aid  of  change  detection  techniques.  A  general  and  flexible 
conflation  model  is  proposed  for  the  distributed  mobile  agent  systems. 
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Based  on  the  model,  over  time  considerable  effort  is  specially  spent  on  the  devel¬ 
opment  of  image  change  detection  algorithm.  Since  image  change  detection,  like  many 
other  applications  in  GIS,  requires  to  handle  fuzziness  and  uncertainty,  an  innovation  is 
investigated  -  it  is  an  intelligent  approach  in  which  the  issue  associated  with  fuzziness 
and  uncertainty  has  been  tackled  by  introducing  a  Certainty  Factor.  A  hierarchical 
structure  for  the  fuzzy  inference  is  figured  out.  Theoretical  analysis  and  real  image 
evaluation  show  that  it  can  provide  significant  results. 
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Chapter  1 

INTRODUCTION 


The  Geographic  Information  System  (GIS)  is  a  computer-based  information  system 
that  can  deal  with  collecting,  modeling,  managing,  analyzing,  and  integrating  spatial 
and  non-spatial  data  in  geographic  applications.  It  has  been  experiencing  a  steady  and 
unprecedented  growth  in  terms  of  the  general  interests,  theory  development,  and  new 
applications  in  the  last  decade  or  so.  Up  to  date,  GIS  has  matured  to  serve  impor¬ 
tant  roles  for  many  academic  disciplines,  government  organizations,  and  commercial 
enterprises. 

The  handling  of  geographic  information  has  always  raised  an  issue  of  scientific  na¬ 
ture.  In  general,  GIS  is  an  integrated  technology  that  combines  concepts  from  computer 
graphics,  spatial  modeling,  database  management,  and  remote  sensing.  However,  no  sin¬ 
gle  technology  can  by  itself  fully  meet  the  requirements  of  a  GIS  application  [3].  There 
is  a  wide  agreement  in  the  GIS  community  that  the  future  success  of  GIS  technology 
will  depend  to  a  large  extent  on  incorporating  more  powerful  analytical  capabilities. 

The  distributed  intelligent  mobile  agent  technique  is  becoming  an  important  issue 
in  GIS  research  areas  due  largely  to  the  following  technological  advantages: 

•  During  the  past  20  years,  the  computer  architecture  has  moved  from  stand-alone 
systems  to  local  and  wide-area  networks.  A  natural  extension  to  this  development 
is  to  study  the  distributed  technology  on  geographic  information.  Recently,  the 
trend  for  most  types  of  information  systems  has  been  a  move  to  a  more  loosely 
coupled,  distributed  nature.  The  typical  GIS  applications  are  GeoChange  [14]  and 
Alexandria  Digital  Library  Project  [16],  which  are  already  underway  to  provide 
users  access  to  geographic  data  cross  wide  area  networks. 

•  Recent  decades  have  been  characterized  by  the  generalized  application  of  artificial 
intelligent  technology  to  improve  the  system’s  capabilities.  We  have  witnessed  a 
rapid  growth  in  the  number  and  variety  of  applications  of  intelligent  technology, 
ranging  from  real-time  control  systems,  and  decision  support  systems  to  business 
systems  and  the  World  Wide  Web  field.  Since  many  applications  in  GIS  involve 
human  expertise  and  knowledge,  which  are  invariably  imprecise,  incomplete,  or 
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not  totally  reliable,  the  intelligent  technology  becomes  a  potential  tool  to  solve 
GIS  problems. 

•  Moreover,  the  emerging  of  a  quiet  GIS  industry  revolution  shows  that  GIS  is  going 
mobile. 

Recently,  the  distributed  intelligent  mobile  agent  systems  have  been  developed  for 
different  purposes  and  applications.  The  project  in  [9]  presents  an  autonomous  agent 
system  that  utilizes  the  distributed  intelligent  agent  techniques  to  handle  the  geo-spatial 
data  from  multiple  heterogeneous  sources. 

Within  the  distributed  environment,  the  data  updating,  integrating,  and  sharing  will 
be  concerned  as  central  issues  in  GIS.  With  these  issues,  new  problems  such  as  data 
structure  incompatibility,  accuracy  incompatibility,  scales  of  measurement  incompati¬ 
bility,  or  inconsistencies,  etc.,  are  arising.  The  most  common  reasons  for  such  problems 
are  that  different  spatial  datasets  from  different  sources  might  use  different  project  sys¬ 
tems,  different  scale  and  accuracy,  and  so  on.  For  dealing  with  such  problems,  the 
important  and  challenging  technique  is  called  conflation  . 

Generally,  the  conflation  is  regarded  as  the  combination  of  information  from  two  or 
more  digital  maps  to  produce  a  third  map  that  is  better  than  either  of  its  component 
sources.  The  objectives  of  conflation  include  increasing  spatial  accuracy  and  consistency, 
and  updating  or  adding  new  features  into  datasets,  updating  or  adding  more  attributes 
that  associate  with  the  features  in  datasets,  etc. 

Seen  from  the  regular  conflation  paradigms,  one  of  the  vulnerabilities  is  lack  of 
capability  to  deal  with  time.  If  an  existing  resource  of  information  fails  to  suit  the 
requirements  because  it  is  out-of-date,  a  process  of  detecting  and  then  updating  may  be 
more  effective  than  a  total  reconstruction.  Thus,  it  is  important  to  identify  differences 
in  the  images  in  terms  of  real  changes. 

Traditionally,  changes  to  geographical  phenomena  have  been  derived  from  a  tem¬ 
poral  reference  frame.  Since  time  is  difficult  to  be  formalized  [22],  to  date,  no  single 
model  in  a  GIS,  which  includes  time,  has  been  adopted.  In  recent  years,  by  integrating 
more  powerful  techniques,  GISs  offer  many  possibilities  for  improved  treatment  of  time 
and  changes.  Recent  research  results  have  demonstrated  that  the  change  detection  can 
be  done  from  satellite  images  or  other  images,  accompanied  by  visual  interpretation  of 
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differences.  Therefore,  as  a  potential  technique,  image  change  detection  encourages  us 
to  make  an  attempt  on  improving  conflation  capabilities. 

The  objective  of  image  change  detection  is  to  detect  changes  in  images  over  time 
Augmenting  conflation  by  utilizing  satellite  images  is  the  motivation  of  the  dissertation. 
The  rest  of  the  dissertation  is  organized  as  follows.  The  following  Chapter  gives  an 
overview  of  related  work,  and  shows  current  methodologies  to  deal  with  conflation 
problems  and  image  change  detection.  Considered  a  vector  GIS,  a  more  general  and 
flexible  conflation  model  is  developed  for  a  distributed  system  in  Chapter  3.  Based 
on  the  model,  two  important  contributions  are  emphasized  in  the  following  chapters. 
Chapter  4  mainly  introduces  a  vector-based  conflation  scheme.  A  hybrid  approach  for 
image  change  detection  is  proposed  in  Chapter  5.  It  is  evaluated  by  using  real  raster 
images  in  Chapter  6.  In  the  conclusion,  the  open  research  issues  axe  discussed  and 
future  research  ideas  for  possible  improvements  of  the  algorithm  are  provided. 


Chapter  2 
BACKGROUND 


.  Geographic  information  provides  the  basis  for  many  types  of  decisions.  With  respect 
to  conflation,  the  merging  of  geographical  data  allows  for  manipulation,  analysis,  and 
comparison  within  and  between  multiple  geographical  databases. 

Too  often,  the  conflation  methods  have  been  ad'hoc,  designed  for  specific  purposes. 
In  this  chapter,  the  overview  of  previous  work  related  to  data  conflation  is  firstly  given. 
Then,  based  on  the  geometric  view  and  GIS  data  models,  the  conflation  problems 
are  classified  into  different  groups  so  as  to  understand  the  conflation  problems  well.  A 
general  conflation  method  for  vector  data  is  summarized.  For  the  purpose  of  augmenting 
conflation,  image  change  detection  issue  is  addressed.  And  then  current  and  potential 
approaches  in  the  image  change  detection  axe  surveyed. 

2.1  Data  Conflation  Issue  and  Related  Work 

When  geo-spatial  data  from  different  sources  are  combined  and  shared,  incompati¬ 
bilities  and/or  inconsistencies  in  geographical  data  occur.  Figure  2.1  shows  a  problem 
arising  from  merging  data  digitized  from  adjacent  map  sheets. 


Figure  2.1:  Unmatched  edge  occurring  in  a  cartographic  image. 

However,  the  incompatibilities  may  be  not  only  structural  but  also  semantic  in 
nature.  For  example,  the  attributes  for  representing  the  same  values  might  be  defined 
differently  in  different  sources,  which  may  include  different  names  or  different  domains 
for  their  associated  value,  e.g.  float  versus  integer. 
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Therefore,  it  is  quite  a  challenge  to  preserve  the  semantics  inherent  in  the  compo¬ 
nent  datasets.  The  important  technical  issue  to  handle  this  situation  is  “conflation”. 
The  term  “conflation”  is  used  to  refer  to  the  integration  of  data  from  different  sources. 
Conflation  is  the  complex  process  of  recognizing  and  removing  inconsistencies  (or  dis¬ 
crepancies)  in  geographical  feature  data  that  are  stored  in  multiple  databases. 

The  history  of  map  conflation  goes  back  to  the  early  mid-1980s.  The  first  clear 
development  and  application  of  an  automated  conflation  process  occurred  during  a 
joint  United  States  Geological  Survey  (USGS)  Bureau  of  the  Census  project  designed 
to  consolidate  the  agencies’  respective  digital  map  files  of  U.S.  metropolitan  areas  [51]. 
The  major  concern  of  conflation  was  to  eliminate  the  spatial  inconsistency  and  improve 
the  spatial  accuracy  of  maps.  The  implementation  of  a  computerized  system  for  this 
task  provided  an  essential  foundation  for  much  of  the  theory  and  many  of  the  techniques 
used  today. 

Since  that  time,  others,  including  numerous  agencies,  institutions  and  commercial 
GIS  vendors,  have  developed  and  implemented  conflation  tools  within  their  applications. 
The  conflation  consideration  in  GIS  has  become  much  broader.  Some  practical  examples 
are  provided  below. 

The  EDNA  project  of  the  USGS  investigated  two  different  processing  approaches 
for  conflation  in  order  to  share  common  attributes  from  separate  coverages  [58].  Atlas 
has  developed  a  conflation  engine  that  automatically  finds  the  common  elements  in  two 
vector  data  sets;  the  engine  is  written  in  C++  and  uses  an  object  oriented  topological 
GIS  infrastructure  [57].  Due  to  a  significant  difference  between  the  TIGER  and  DALIS 
Projects  geography,  the  project  [11]  provides  a  method  that  conflates  the  Census  1990 
Block  coverage  to  the  DALIS  base.  It  demonstrates  that  it  is  more  important  to  be 
able  to  effectively  relate  all  data  sources  with  other  valuable  attribute  information  files 
such  as  census-attributed  data. 

Right  now  automatically  conflating  digital  map  data  has  presented  a  rich  set  of 
computational  challenges.  Based  on  geo-statistical  techniques,  some  primary  conflation 
paradigms  are  presented  in  the  literature,  and  have  been  proven  to  be  very  effective  for 
regular  data  such  as  street  networks.  In  [42],  a  geo-statistical  method  for  combining  data 
of  different  sources  is  presented,  and  its  reliability  is  analyzed.  However,  it  is  surprised 
that  very  little  progress  has  been  made  since  the  first  successful  implementation  of  a 
system  based  on  geo-statistical  techniques.  One  of  the  main  reasons  is  the  complexity 
and  uncertainty  of  GIS  problems. 
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As  GIS  becomes  more  complex  and  intelligent,  artificial-intelligent-based  methods 
will  become  a  cost-effective  approach  for  improving  its  problem-solving  abilities.  More 
recently,  researchers  have  turned  to  fuzzy  logic  and  other  intelligent  methods  for  han¬ 
dling  reasoning  abilities  under  conditions  of  uncertainty  to  help  solve  general  conflation 
problems.  This  approach  certainly  shows  greater  promise  for  producing  a  wider  range 
of  acceptable  results;  however,  models  for  implementation  have  still  been  limited  The 
following  gives  a  brief  survey  of  some  related  work. 

The  Air  Force  Rome  Laboratory  Multiple  Data  Base  Integration  and  Update  sys¬ 
tem  [35]  reviews  the  various  technical  challenges  encountered  during  automated  vector 
conflation  and  all-source  updating  of  a  unified  vector  data  baseline,  and  highlights  its 
functions  by  taking  the  advanced  technologies  like  Artificial  Neural  Networks.  In  [7]  a 
rule-based  approach  for  conflation  of  attributed  vector  data  is  introduced,  which  was 
performed  within  the  context  of  the  Digital  Mapping,  Charting  &  Geodesy  Program 
(DMAP)  at  the  Naval  Research  Laboratory  (NRL),  Stennis  Space  Center,  Mississippi. 
A  knowledge-based  system  to  handle  uncertainty  issues  of  conflation  in  a  distributed 
environment  is  developed  in  [8].  The  paradigm  is  based  on  a  hierarchical  rule-based 
system  that  utilizes  techniques  for  reasoning  under  uncertainty.  These  kinds  of  models 
proposed  are  to  overcome  some  of  the  more  difficult  feature  matching  cases  encountered 
in  more  generic,  irregular  geographic  features. 

Although  a  little  improvement  is  made,  the  issues  associated  with  conflation  have 
serious  consequences  for  many  applications,  which  range  from  combining  digitized  to¬ 
pographic  maps,  to  edge-matching  of  misfit  data  across  boundaries,  to  combining  in¬ 
formation  from  different  sensors  in  remote  sensing.  It  is  no  doubt  that  users  are  faced 
with  extensive  duplication  of  efforts  and  unnecessary  cost  without  the  ability  to  con¬ 
flate/integrate  data  from  different  sources. 

As  a  result,  conflation  typically  is  needed  because: 

•  Users  wish  to  update  their  information  without  losing  legacy  data  which  may  not 
be  included  in  the  new  information; 

•  One  source  may  be  more  accurate  with  respect  to  information  such  as  position 
and  attribute;  and, 

•  One  source  contains  information  missing  in  another,  such  as  additional  features, 
feature  attributes  or  even  entire  coverages. 


CHAPTER  2.  BACKGROUND 


7 


2.2  Conflation  Classification 

Conflation  means  to  combine  information  from  different  sources  and  then  to  produce 
better  information.  Prom  the  geographic  point  of  view,  GISs  are  spatial  reference 
systems  in  which  spatial  data  are  generally  treated  in  two  dimensions,  i.e.  longitude  and 
latitude  directions.  The  spatial  reference  system  provides  the  geometric  basis  to  connect 
the  different  sources.  The  combinations  of  different,  sources  in  the  spatial  reference 
systems  may  result  in  three  different  versions  of  conflation.  More  details  are  given  as 
follows: 

•  'Data  conflict  when  combining  the  same  type  layers  of  a  region; 

•  Data  overlay  when  combining  the  different  type  layers  of  the  same  region;  and, 

•  Edge  matching  when  combining  the  same  type  layers  of  neighboring  regions. 

A  general  conflation  classification  is  shown  in  Figure  2.2. 


Figure  2.2:  A  general  conflation  classification  based  on  spatial  geometry 

The  objective  of  the  horizontal  conflation  is  to  eliminate  the  spatial  feature  position 
and  attribute  discrepancies  that  exist  in  the  common  area  of  two  sources  [64].  The 
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vertical  conflation  is  actually  a  process  to  perform  an  overlay  operation  on  the  datasets 
from  different  layers  of  the  same  region.  It  requires  that  the  two  sources  are  identical. 
Also,  since  none  of  sources  is  perfectly  error-free,  the  apparent  change  includes  some 
amount  of  errors  mixed  with  the  actual  change. 

The  typical  applications  of  the  vector-based  conflation  include  [64]: 

•  Spatial  Discrepancy  Elimination:  It  is  a  global  adjustment  process  for  spatial 
feature  coordinates  in  order  to  eliminate  the  feature  position  discrepancies.  This 
application  exists  in  map  edge-matching,  map  compilation,  and  etc. 

•  Spatial  Feature  Transfer:  In  this  process,  the  common  features  from  different 
sources  are  recognized,  and  flagged;  new  features  can  be  added  into  the  old  source, 
or  old  coordinates  can  be  updated.  The  main  applications  exist,  in  the  GIS  spatial 
data  updating. 

•  Attribute  Transfer:  Usually,  it  is  used  to  transfer  the  lower  spatial  accuracy 
source  to  the  higher  accuracy  source,  or  old  version  to  new  version. 


Figure  2.3:  A  conflation  classification  based  on  GIS  data  model 

Based  on  the  data  models  in  GISs  shown  in  Figure  2.3,  the  conflation  can  be  classified 
as  vector-vector,  vector-image,  and  image-image  conflation. 

The  most  common  conflation  problem  is  the  vector-vector  conflation  since  many 
products  in  GIS  are  based  on  the  vector  data.  Conflating  vector  data  with  imagery  is 
also  an  important  aspect  in  GIS  applications.  It  involves  many  digital  image-processing 
technologies  such  as  image  feature  extraction,  shaping,  re-sampling,  model  recognition, 
and  so  on.  Image  to  image  conflation  involves  more  image  matching  technologies. 
Usually,  it  is  performed  by  specialists. 
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2.3  A  General  Conflation  Method  for  Vector  Data 

The  goal  of  conflation  is  to  have  the  “best”  information  for  a  given  area.  Seen 
from  the  previous  views,  there  are  many  algorithms  that  have  been  developed  to  solve 
different,  practical  conflation  problems.  For  vector  database  users,  the  objective  of 
conflation  is  to  combine  different  source  coverage  into  one,  and  to  reconcile  the  best 
features  geometries  and  their  related  attributes  from  the  different  source  coverages.  The 
general  conflation  processing  steps  can  be  shown  in  Figure  2.4. 


Figure  2.4:  Steps  involved  in  a  general  conflation  process 


•  Positional  re-alignment 

Since  several  coordinate  systems  such  as  the  Universal  Transverse  Mercator,  Global 
Position  System,  and  Local  Cadastral  System  may  be  used  in  a  distributed  sys¬ 
tem,  integrating  information  from  different  sources  requires  transformation  of  the 
coordinate  system  of  the  source  to  the  one  adopted  for  the  specific  spatial  database 
in  use.  Positional  re-alignment  is  a  mathematical  procedure  in  which  previously 
identified  matching  features  are  brought  into  spatial  agreement. 

•  Feature  matching 

Simply  and  perhaps  somewhat  obviously  stated,  feature  matching  involves  the 
identification  of  features  from  different  maps  as  being  representations  of  the  same 
geographic  entity. 


In  fact,  feature  matching  can  be  considered  as  a  type  of  classification  problem. 
That  is,  we  are  tying  to  determine  whether  one  feature  belongs  to  the  same  “class” 
as  another.  This  type  of  problem  can  be  handled  through  theories  of  evidential 
reasoning  uncertainty  such  as  fuzzy  logic  [29]  or  Dempster-Shater  theory  [54], 
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•  Deconfliction 

It  is  a  process  in  which  contradictions  in  a  matching  pair’s  attributes  and/or 
values  axe  resolved. 

2.4  Change  Detection  Issue  and  Related  Work 

Change  detection  has  been  widely  studied  in  the  field  of  computer  vision,  remote 
sensing,  and  image  processing.  Changes  over  time  in  a  geographical  area  are  particularly 
important  in  such  applications  as  deforestation,  archeology,  environmental  monitoring, 
urban  planning  and  development,  damage  assessment,  and  so  on. 

Image  change  detection  is  a  process  of  identifying  differences  in  the  state  of  an 
object  or  phenomenon  by  comparing  the  images  at  different  times.  Since  the  early 
1970s,  satellite  images  in  digital  form  have  been  available.  This  vast  amount  of  satellite 
data  offers  unique  possibilities  of  comparing  images  from  an  earlier  date  with  new  ones. 
In  [44]  several  satellite  image  change  detection  techniques  are  investigated,  and  results 
show  that  the  satellite  image  change  analysis  can  provide  good  information.  Moreover, 
remarkable  advances  in  remote  sensing  technology  have  motivated  many  researchers  to 
exploit  change  detection  approaches  and  techniques  for  a  variety  of  purposes,  such  as 
identifying  buildings,  monitoring  regional  changes,  land  use,  wide  area  surveillance  over 
a  long  period  of  time,  and  etc.  [50]— [55]  [21]. 

In  the  recent  decade,  change  detection  from  remote  sensing  images  has  emerged 
as  one  of  the  most  active  and  fruitful  areas  in  GISs.  Such  applications  Can  be  found 
in  many  GIS,  for  example,  a  change  detection  approach  for  linear  features  in  aerial 
photographs  in  [50],  digital  change  detection  techniques  for  civilian  and  military  in 
[10]  and  for  wide  area  surveillance  in  [37].  Specifically,  the  paper  [47]  investigated  the 
applicability  of  image  detection  techniques  for  the  purpose  of  emergency  response  by 
utilizing  remotely  sensed  bi-temporal  imagery  data. 

The  successful  application  of  the  change  detection  techniques  from  images,  and  the 
emphasis  on  near  real-time  and  new  information  encourage  us  to  use  change  detection 
techniques  for  augmenting  conflation. 

2.5  Image  Change  Detection  Methods 

There  are  many  methods  and  techniques  for  the  detection  of  changes,  which  have 
been  developed  by  mathematicians,  geographers,  philosophers,  and  computer  scientists. 
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The  basic  and  commonly  used  techniques  are  briefly  described  as  follows  [10,  26]: 

Image  Deviation'.  The  image  deviation  is  a  basic  and  simple  method  for  analyzing 
long  time  series  data.  With  image  deviation,  the  change  areas  are  identified  by 
contrast  to  a  long-time  average.  The  possibilities  would  be  to  produce  a  mean 
image  over  the  whole  series,  to  examine  trends  in  environmental  changes  or  detect 
significant  anomalies  from  general  trend. 

Image  Differencing :  The  image  differencing  is  one  of  the  simplest  techniques  for 
detecting  changes  between  two  images.  With  this,  each  pixel  from  an  image  is 
subtracted  from  its  corresponding  pixel  in  another  image.  The  resultant  image 
represents  the  change  between  the  two  different  times. 

Cross-classification:  It  is  a  procedure  used  to  compare  two  images  by  calculating 
the  logical  AND  of  all  possible  combinations  on  the  two  classified  input  images 
The  aim  is  to  evaluate  whether  areas  fall  into  the  same  class  in  the  two  dates  or 
whether  a  change  to  a  new  class  has  occurred. 

The  basic  change  techniques  actually  are  two  different  kinds  of  change  detection 
techniques:  differencing  and  classification.  They  provide  a  basis  for  later  change  detec¬ 
tion  techniques. 

Since  the  basic  methods  are  used  in  small  area  information,  for  dealing  with  larger 
area  information,  some  efficient  algorithms  are  developed  for  specific  purposes.  In  [48], 
three  different  methods  for  different  motivations  are  introduced,  that  is,  a  linear  change 
detection  for  small  changes,  non-linear  change  detection  for  large  changes  and  Delta 
filtering  for  enhancing  specific  patterns  of  changes. 

Moreover,  an  edge-based  change  detection  algorithm  was  developed  in  [34],  which 
requires  a  single  target  region  and  a  reference  region  to  be  supplied.  A  target  region 
is  an  area  that. represents  the  interested  area,  in  which  a  change  is  to  be  detected.  A 
reference  region  is  an  area  that  represents  a  homogeneous  textural  area  where  no  change 
occurs.  Since  the  edge-based  change  detection  algorithm  depends  heavily  on  accurate 
reference  region  information,  each  target  must  have  an  appropriate  reference  region. 
Ideally,  the  reference  region  covers  an  area  that  is  identical  to  the  area  of  the  target 
region  when  no  change  has  occurred. 
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No  matter  what  the  change  detection  algorithm  is,  performing  the  change  detection 
requires  the  same  data  scale  or  model.  By  surveying  the  whole  research  field  for  the 
image  change  detection,  it  can  be  found  that  most  of  the  leading  image  change  detection 
algorithms  are  based  on  the  raster  data. 

In  the  real  world,  the  system  may  have  two  different  data  formats  such  as  vector 
and  raster.  Some  hybrid  methods  for  change  detection  have  been  investigated.  In  [34]  a 
customized  Land  Change  Detection  Tool  was  developed  by  integrating  rater  and  vector 
data. 

A  number  of  comparative  studies  show  that  there  is  no  universally  optimal  change 
detection  method,  the  choice  is  dependent  upon  the  application.  However,  the  inte¬ 
grated  methods  provide  us  with  a  good  idea  to  design  a  new  approach  for  image  change 
detection. 


2.6  Change  Detection  Consideration 

Although  the  image  change  detection  techniques  based  on  comparison,  such  as  image 
deviation,  differencing,  and  so  on,  appear  to  be  quite  rich,  they  are  not  suitable  for  real¬ 
time  detection  since  much  larger  time  series  data  are  required.  And  the  classification 
techniques  for  the  image  change  detection  require  exact  matching  different  categories. 
Considered  the  raster  model,  two  important  change  detection  techniques  based  on  the 
statistical  theory  are  mainly  concerned  in  this  dissertation. 

•  Correlation  Analysis 

The  simple  correlation  analysis  can  be  used  to  detect  changes  in  a  certain  range. 
There  is  a  high  correlation  between  image  data  for  regions  that  have  not  changed 
significantly,  and  a  relatively  low  correlation  between  regions  that  have  changed 
substantially.  Correlation  analysis  generally  fails  to  detect  a  consistent  change. 
Especially  if  images  are  acquired  under  different  illumination  conditions. 

•  Principal  Components  Analysis  (PC A) 

As  indicated  in  the  previous  method,  correlation  coefficient  is  not  sufficiently 
stable  with  respect  to  some  minor  changes  in  the  intensity  value  of  the  pixels.  To 
resolve  this  problem,  a  method  of  PCA  was  proposed  [61]. 

Principal  components  analysis  is  a  powerful  technique  for  extracting  a  structure 
from  a  potentially  high-dimensional  data  set.  It  has  in  the  recent  years  found 
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a  number  of  applications  in  the  fields  of  computer  vision  and  pattern  recogni¬ 
tion.  PCA  is  based  on  representing  typical  images  in  terms  of  a  compact  set  of 
orthogonal  basis  images.  The  main  ideas  behind  PCA  are: 

A  principal  component  P  is  a  linear  combination  of  the  observed  features: 

d 

P  =  J2wiXu  (2.1) 

Z— 1 

where  A*  is  the  ith  feature  and  the  weight  Wi  is  chosen  to  maximize  the  ratio  of 
the  variance  of  P  to  the  total  variation,  subject  to  the  constraint 

d 

£W?  =  1.  (2-2) 

Z— 1 

The  principal  components  are  obtained  by  computing  the  eigenvectors  of  the 
variance-covariance  matrix  of  the  features.  The  importance  of  each  eigenvector 
is  reflected  by  the  associated  eigenvalue.  Each  instance  feature  vector  can  be 
represented  in  terms  of  a  linear  combination  of  principal  components. 


A  kernel  principal  component  analysis  was  recently  proposed  as  a  nonlinear  ex¬ 
tension  of  a  PCA.  The  basic  idea  is  to  first  map  the  input  space  into  a  feature 
space  via  a  nonlinear  mapping  and  then  compute  the  principal  components  in 
that  feature  space  [27].  A  kernel  PCA  has  already  been  shown  to  provide  a  better 
performance  than  a  linear  PCA  in  several  applications  [53]. 

However,  a  frequent  question  is  how  many  principal  components  are  deemed  ad¬ 
equate  for  a  particular  situation.  Different  criteria  have  been  proposed.  Some 
criteria  set  a  threshold  for  eigenvalues  so  that  principal  components  with  associ¬ 
ated  eigenvalues  less  than  this  threshold  are  deleted.  Some  criteria  set  a  threshold 
for  the  variation  accounted  for  and  select  the  principal  components  with  larger 
eigenvalues  first  until  the  threshold  is  reached.  Sometimes  a  fixed  number  of 
principal  components  with  the  largest  eigenvalues  is  mandatorily  selected  [17]. 

From  real-time  or  near  real-time  point  of  view,  it  is  no  doubt  that  the  correlation 
analysis  should  be  an  optimal  choice. 


Chapter  3 

A  FLEXIBLE  CONFLATION  MODEL 


Although  many  efforts  have  been  made  to  solve  conflation  problems,  it  is  still  a  chal¬ 
lenging  research  field  because  of  the  complexity  of  real  applications.  More  specifically, 
it  appears  that  a  satisfactory  level  of  reliability  has  not  been  achieved  from  the  real-time 
or  near  real-time  point  of  view.  Therefore,  the  further  attention  towards  enhancing  the 
conflation  ability  will  be  stressed. 

Assume  that  the  data  on  hand  is  in  vector  format  such  as  Vector  Product  Format 
(VPF),  and  the  latest  or  new  data  comes  from  satellite  image,  the  study  endeavor  of 
the  dissertation  is  to  explore  the  ways  in  which  the  conflation  accuracy  can  be  obtained 
with  the  aid  of  change  detection  techniques.  The  perspective  regarding  vector  conflation 
comes  from  the  fact  that  many  vector  products  in  GIS  already  exist  and  are  in  field  use. 
With  vector  databases,  the  dissertation  will  focus  on  vector-vector  conflation  problems. 


3.1  Data  Models  in  GIS  and  Their  Conversion 

There  are  two  main  models  for  the  representation  of  data  in  a  GIS,  that  is,  the  raster 
model  and  the  vector  model  [28].  These  models  are  the  basis  of  conflation. 

The  raster  data  model  is  based  on  an  array  or  grid  of  square  cells,  each  cell  represents 
a  square  parcel  of  the  real  world.  A  value  is  assigned  to  each  cell,  which  represents  an 
attribute  of  the  real  world  parcel.  Because  of  its  data  structure,  the  raster  model  is 
the  best  format  for  the  applications  such  as  slope,  land  cover  types,  remote  sensing, 
satellite  imagery,  aerial  photographs,  and  so  on. 

The  vector  data  model  represents  objects  as  points,  lines,  and  polygons  that  are 
referenced  to  real  world  objects  using  coordinate  systems.  The  vector  model  allows 
you  to  store  topological  information.  The  primary  limitation  of  the  vector  model  is  the 
inability  to  represent,  analyze,  and  process  continuous  data  such  as  aerial  photographs, 
and  satellite  images. 
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Figure  3.1  shows  the  real  image  based  on  raster  model.  Figure  3.1(a)  presents 
Morphological  Raster  Data  which  includes  building,  open  area,  water  and  hard  surface. 
Figure  3.1(b)  provides  the  building  outlines  represented  as  raster  data  and  polygons, 
respectively.  The  difference  between  raster  and  vector  representation  of  a  building 
outline  is  obviously  shown  in  Figure  3.1(c). 


Figure  3.1:  Examples  of  Raster  and  Vector  Data 


In  general,  the  vector  data  structure  produces  smaller  file  sizes  than  raster  image 
because  a  raster  image  needs  space  for  all  pixels  while  only  point  coordinates  are  stored 
in  the  vector  representation.  Besides  the  size  issue,  the  vector  is  easier  than  raster 
data  to  handle  because  it  has  fewer  data  items  and  it  is  more  flexible  to  be  adjusted 
for  a  different  scale.  Although  the  vector  data  structure  is  the  choice  as  the  primary 
form  for  handling  graphical  data  in  most  GISs,  the  vector  data  acquisition  is  often 
more  difficult  than  the  raster  image  acquisition  due  to  its  abstract  data  structure,  and 
topology  between  objects  and  attributes  associated. 

The  data  conversion  refers  to  the  process  of  converting  data  from  one  format  to 
another.  In  the  real  world,  such  a  conversion  is  necessary  due  to  the  suitability  of 
different  data  formats  for  different  purposes.  For  example, 

•  The  data  on  hand  is  in  a  vector  format,  but  latest  update  comes  from  the  satellite 
image  format  (raster).  Such  applications  include  creating  a  vegetation  map  from 
classified  satellite  data  [56]. 

•  One  may  use  statistical  tools  to  analyze  the  information  based  on  a  raster  format, 
but  the  data  in  hand  is  in  vector  format.  This  requires  converting  a  vector  dataset 
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to  a  raster  image.  A  typical  application  is  to  compare  the  satellite  image  with 
census  data. 

Today  many  vendors  have  provided  the  tools  for  conversion  between  the  two  different 
models.  Therefore,  it  is  not  difficult  to  deal  with  both  vector  data  and  raster  data  in 
real  applications. 


3.2  A  Mobile  Agent  Prototype  Model  for  Conflation 

With  a  growing  abundance  of  geo-spatial  data,  the  situations  where  more  than  one 
data  set  is  available  to  meet  a  particular  need  are  becoming  common[20].  In  order  to 
create  a  best  possible  database,  the  data  conflation  is  needed.  For  performing  data 
conflation,  a  general  distributed  mobile  agent  model  can  be  designed  as  Figure  3,2. 


The  architecture  consists  of  a  primary,  centralized  database  and  multiple,  heteroge¬ 
neous  databases.  Based  on  this  architecture,  there  exist  multiple  agent  classes.  A  brief 
description  is  given  as  follows: 

•  QM  is  a  queue  manager:  It  is  responsible  for  supervising  a  priority  queue  generated 
by  the  ROI  agents.  Where  ROI  stands  for  the  Region  Of  Interest. 
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•  AM  is  an  agent,  manager:  Its  responsibility  is  to  corporate  agent  classes. 

•  RA  represents  ROI  Agent:  Each  RA  is  responsible  for  managing  updates  for 
a  particular  Region  of  Interest.  RAs  are  static,  remaining  on  the  centralized 
database. 

•  CA  represents  Conflation  Agent:  The  CA  is  a  super  class  of  many  specialized 
agent  classes  that  have  extensive  knowledge  about  their  domain  relevant  to  the 
conflation  process.  Moreover,  CA  is  the  intelligent  mobile  agent  which  travels  to 
the  heterogeneous  databases  to  perform  conflation. 

•  QA  represents  Querying  Agent:  It  is  released  by  the  centralized  database  to  gather 
information  for  general  or  conflation-related  queries.  The  generator  or  creator  of 
QAs  varies  with  the  different  purposes. 

A  typical  scenario  of  performing  conflation  can  be  described  as  a  following  process: 

Given  a  specific  task,  the  information  related  to  the  specific  task  might  be  stored 
in  multiple  databases.  The  system  will  generate  and  launch  QAs  to  collect  data  from 
sources.  When  the  customized  and  intelligent.  CAs  travel  to  the  sources,  they  bring  the 
data  set  collected  by  QA  to  the  matching  knowledge  base  where  a  conflation  algorithm 
will  detect  changes  and  select  the  best  information. 

Since  the  conflation  algorithms  often  deal  with  complicated  problems  in  real  world, 
the  conflation  process  is  regarded  as  a  practical  exercise.  Due  to  the  development 
of  Internet/Intranet  and  Open  GIS,  there  are  great  opportunities  to  get  data  from 
different  data  providers.  One  of  the  promising  applications  in  GIS  is  to  integrate  data 
from  different  sources.  However,  how  to  extract  useful  information  and  combine  this 
information  to  meet  the  specific  requirements  is  still  a  great  challenge.  Some  research 
results  show  that  a  potential  solution  of  GIS  is  to  use  an  integrated  hybrid  of  advanced 
technologies  (i.e. ,  artificial  neural  networks,  fuzzy  logic,  evidential  reasoning,  and  so  on) 
to  help  automatically  integrate,  reconcile,  and  update  multiple  disparate  geo-spatial 
databases  in  near  real  time. 

In  confronting  a  distributed  environment  shown  in  Figure  3.2,  by  taking  the  potential 
advantages  of  artificial  intelligent  technologies,  a  general  conceptual  model  for  conflation 
can  be  shown  as  Figure  3.3. 
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Figure  3.3:  A  conceptual  model  for  distributed  conflation 


The  process  can  be  briefly  described  as  follows.  The  query  manager  retrieves  all 
feature  objects  from  the  distributed  databases  that  store  the  related  information,  and 
collects  the  objects.  Any  object  determined  to  be  a  match  is  placed  in  the  matching 
feature  set  for  conflating,  and  ranked  according  to  similarity  scores.  The  Conflation 
agent  lunched  from  the  central  database  will  travel  to  the  matching  feature  set  to 
perform  the  conflation  process. 

3.3  A  Raster-Based  Vector  Conflation  Model 

From  the  previous  work,  the  following  consideration  or  knowledge  will  be  very  helpful 
for  designing  a  conflation  scheme. 

•  The  conflation  is  most  commonly  done  with  the  vector  map. 

•  Most  of  the  leading  algorithms  for  image  change  detection  are  based  on  the  raster 
data. 

•  The  conflation  of  vector  data,  itself,  is  generally  a  costly  and  time-consuming 
task  involving  feature-matching  between  two  data  sets  and  then  transferring  the 
feature’s  attributes  from  one  data  set  to  the  other. 

•  Since  performing  conflation  or  change  detection  requires  the  same  data  scale  or 
model,  it  is  unavoidable  to  undertake  the  transformation  process  during  conflation 
and  change  detection.  However,  the  transformation  between  vector  and  raster  is 
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not  a  short  time  process.  Especially,  the  vectorization,  to  generate  the  vector  data 
from  original  image,  is  expensive  and  time  consuming. 

Based  on  the  above  considerations,  a  raster-based  vector  conflation  model  is  devel¬ 
oped  as  shown  in  Figure  3.4.  The  principal  purpose  is  to  use  a  vector  version  conflation 
algorithm  and  raster-based  change  detection.  A  general  scheme  of  this  conflation  model 
includes  the  following  steps: 
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•  Design  knowledge  bases  for  conflation  based  on  the  vector  data; 

•  Design  knowledge  bases  for  change  detection  by  means  of  the  raster  data  coming 
from  near  real-time  satellite  image; 

•  Transform  vector  data  to  the  raster  image; 

•  Develop  vector  version  conflation  algorithm; 

•  Develop  an  algorithm  for  change  detection  which  will  detect  the  information 
change  significantly;  and, 

•  Vectorize  raster  data  -  this  process  will  involve  finding  a  set  of  pixels  that  can 
approximate  points,  lines,  and  polygons  of  the  vector  data. 

The  final  step  is  to  select  one  of  the  conflation  algorithms  according  to  the  results  of 
the  raster-based  image  change  detection.  If  no  change  occurs,  it  is  a  regular  conflation 
process.  Otherwise,  it  will  perform  the  raster-based  vector  conflation. 

Each  of  the  above  steps  in  the  scheme  requires  a  sufficient  study  in  order  to  im¬ 
plement  an  appropriate  and  practical  conflation  process.  However,  the  focus  of  this 
dissertation  will  be  placed  specifically  on  processing: 

•  The  conflation  such  as  spatial  and  non-spatial  conflation  based  on  VPF  databases. 

•  Image  change  detection  in  terms  of  satellite  images. 

Moreover,  since  the  uncertainty  in  geospatial  data  is  an  ancient  problem,  it  has  been 
the  subject  of  a  growing  volume  of  research  during  the  past  decade  and  continues  to 
figure  prominently  in  research  agendas[20j.  The  dissertation  will  put  special  efforts  on 
dealing  with  uncertainty. 


3.4  Significance  of  the  New  Model 


The  scheme  of  the  raster-based  vector  model  in  Figure  3.4  is  developed  for  the  vector 
data  format,  and  can  handle  changes  over  time  by  means  of  raster  data.  It  can  work  in 
the  different  ways  such  as: 


CHAPTER  3.  A  FLEXIBLE  CONFLATION  MODEL 


21 


•  Pre-detection/Post-conflation: 

In  this  way  the  image  change  detection  can  be  done  at  each  vector  database  before 
conflation.  This  is  a  case  in  which  every  database  can  have  the  satellite  image 
information  for  updating. 

•  Pre-conflation/Post-detection: 

In  this  way  regular  conflation  goes  first.  After  conflation  among  the  multiple 
vector  databases,  the  “best”  one  will  be  selected  to  perform  image  the  change 
detection  with  the  raster  image.  If  no  any  changes  occur,  the  scheme  works  in 
the  exact  same  way  as  before  shown  in  Figure  3.3.  Otherwise,  it  performs  a 
raster-based  vector  conflation  again. 


Therefore,  the  new  model  is  a  more  general  and  flexible  conflation  model. 


Chapter  4 

A  VECTOR-BASED  CONFLATION  SCHEME 


Geographic  datasets  that  have  both  rich  attribution  and  good  positional  accuracy  can 
be  developed  through  conflation.  There  exist  quite  extensive  literatures  on  conflation. 
Most  of  these  methods  employ  statistical  tools,  feature  matching  and  neural- nets.  Each 
of  these  techniques  has  its  own  merits  and  demerits.  One  generic  limitation  of  these 
works  lies  in  exact  matching  of  spatial  features/attributes. 

Conflation  is  such  a  large  topic  that  mastery  can  only  be  achieved  through  expe¬ 
rience.  Fuzzy  logic,  which  has  been  proven  to  be  successful  in  inexact  environment, 
can  equally  be  used  for  inexact  matching  in  conflation  problems.  Recently,  some  ap¬ 
proaches  using  AI  techniques  such  as  fuzzy  membership  function  have  been  developed 
[12,  63].  The  desirable  conflation  algorithm  mentioned  before,  which  was  proposed  in 
[7,  8],  utilized  a  hierarchical  rule-based  approach  for  feature  matching.  But  none  of 
these  techniques  consider  both  AI  and  statistical  techniques.  Therefore,  the  current 
conflation  algorithms  may  work  reasonably  well  on  one  type  of  conflation  problems, 
there  is  no  general  method. 

The  premise  of  this  dissertation  has  been  that  conflation  is  an  inexact  process. 
The  geodata  sets  in  conflation  are  defined  as  digital  spatial  files  such  as  VPF  files 
that  cover  the  same  area,  describe  the  same  information  and  may  vary  in  density  and 
accuracy.  In  such  a  dataset,  three  types  of  geographic  objects,  i.e.  points,  lines,  and 
polygons,  are  used  to  simplify  and  symbolize  the  complex  real  world,  and  the  attribute 
information  associated  with  objects  gives  meaning  of  objects  and  distinguishes  objects 
from  one  another.  The  purposes  of  conflation  include  1).  increasing  spatial  accuracy  and 
consistency;  2).  adding  new  spatial  features  into  datasets;  3).  updating  and/or  adding 
more  attributes.  The  intention  is  to  develop  a  conflation  scheme  by  which  components 
can  be  integrated  to  solve  any  specific  conflation  problems. 

4.1  Conflation  Components 

A  number  of  real  world  problems  are  complicated.  Because  of-  different  represen¬ 
tations  of  the  real  world,  such  as  different  scales,  different  data  standards,  different 
classifications  and  different  semantic  meanings,  some  matching  criteria  are  good  for 


22 


CHAPTER  4.  A  VECTOR-BASED  CONFLATION  SCHEME 


23 


one  case,  but  for  other  cases.  It  is  difficult  to  use  a  single  algorithm  for  dealing  with 
all  conflation  problems.  The  component- ware  technology  provides  a  way  to  solve  this 
problem. 

In  contrast  with  the  traditional  database  applications,  GIS  applications  require  both 
spatial  and  non-spatial  data.  Within  vector  databases  or  datasets,  there  are  basic 
types  of  information  for  features,  that  is,  geometric  properties  (location),  topological 
properties  (relationships),  and  non-spatial  properties  (attributes).  The  Vector  Product 
Format  (VPF)  specifies  a  georelational  framework  based  on  a  vector  data  model  that 
is  suitable  for  large  geographic  database.  In  such  a  large  database,  inconsistencies 
may  be  caused  in  all  geometric,  topological  and  attribute  aspects.  It  is  reasonable  to 
consider  conflation  in  each  aspect.  By  means  of  the  component- ware  technology,  which 
is  a  method  that  can  be  used  to  develop  independent  but  interoperable  components, 
conflation  components  in  a  vector-based  database  can  be  designed  as  following.  It  is 
desirable  to  assemble  these  components  to  solve  any  specific  problems. 

4.1.1  Geometric  Conflation  Component 

Geometry  helps  us  to  understand  the  representation  of  the  spatial  position  including 
its  shape  and  size.  With  vector  GIS,  geographic  objects  are  represented  geometrically 
by  points,  lines,  or  polygons  [5]. 

Generally,  the  need  for  geometric  conflation  on  images  arises  from  the  fact  that  we 
want  to  “force”  the  object  coordinates  to  fit  a  designated  coordinates.  This  is  required 
to  be  able  to  measure  the  position,  size,  distance,  and  other  geometric  parameters  of 
the  objects. 

•  Points:  The  objects  that  occupy  very  little  or  no  arial  extent  are  often  rep¬ 
resented  as  points.  Thus,  a  point  is  a  O-dimensional  geometric  primitive.  The 
spatial  position  of  a  point  is  described  by  one  set  of  coordinates  referring  to  one 
georeference  system. 

•  Lines:  A  line  is  a  bounded  continuous  1-dimensional  geometric  primitive.  Linear 
features  are  best  described  as  lines. 

•  Areas  or  regions:  An  area  is  a  bounded  continuous  2-dimensional  geometric 
primitive.  A  boundary  is  a  closed  1-dimensional  non-intersecting  element  defined 
by  a  boundary  line.  Within  a  vector-based  database,  areas  or  regions  are  usually 
structured  as  polygons[ 32]. 
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As  a  whole,  points,  lines,  and  polygons  are  referred  as  to  geographic  objects  since 
they  represent  geographic  features.  Such  a  conflation  component  will  be  used  to  deal 
with  spatial  object  inconsistencies. 

4.1.2  Attribute  Conflation  Component 

Metadata  contain  a  wide  range  of  information  about  the  image  data  to  assist  users  in 
determining  the  availability,  quality,  and  usefulness  of  the  data.  Attribute  information 
is  one  of  major  information  in  the  metadata. 

Attributes  provide  meanings  to  the  geographic  features.  For  example,  the  color  of 
a  building,  the  width  of  a  road  and  so  on.  There  are  four  types  of  attributes:  [5] 

•  Quantitative  (also  called  continuous)  attributes,  which  correspond  to  quantities 
that  can  be  measured  in  a  given  unit.  Examples  include  width,  temperature, 
height; 

•  Qualitative  (also  called  discrete)  attributes,  which  do  not  correspond  to  quantities 
but  usually  to  a  finite  set  of  values  that  can  be  enumerated.  Examples  include 
classes,  codes; 

•  Geometry;  and, 

•  Description,  which  includes  text,  graphs,  photographs. 

The  component  of  attribute  conflation  will  be  used  to  match  features.  The  attribute 
conflation  algorithms  are  also  referred  to  as  the  semantic  methods.  They  can  be  used 
to  match  features  very  efficiently  if  both  datasets  defined  a  common  attribute  field  and 
the  semantics  of  both  data  sets  are  known.  Little  research  has  been  done  in  this  aspect. 
Usually,  it  will  be  implemented  at  the  metadata  level. 

4.1.3  Topological  Conflation  Component 

Topology  is  the  study  of  the  characteristics  of  geometrical  objects  that  are  indepen¬ 
dent  of  the  underlying  coordinate  system.  Usually,  topological  relationships  express  the 
concepts  of  inclusion  and  neighborhood.  The  main  purpose  for  providing  topological 
information  in  GIS  is  to  improve  spatial  analysis  capabilities. 

Currently,  VPF  supports  four  levels  topology: 
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•  Level  0  —  boundary  representation  only;  no  intersections  of  lines  or  areas  are 
considered. 

•  Level  1  —  non-planar  graph,  commonly  referred  to  as  “spaghetti” ;  it  is  suitable 
for  representing  networks. 

•  Level  2  —  planar  graph,  in  which  no  edges  overlap. 

•  Level  3  —  full  topology,  in  which  no  faces  overlap. 

With  respect  to  full  topology,  topological  conflation  component  will  be  used  to  search 
range  for  attribute  conflation,  and  check  the  results  of  geometric  matching.  It  requires 
available  topological  information  such  as  connectivity,  and  adjacency.  They  are  seldom 
used  alone.  Sometimes  they  are  aided  by  a  geometrical  method. 


4.2  Intelligent  Conflation  Algorithms 

Knowledge  plays  a  critical  role  in  an  intelligent  system.  The  analysis  of  geographic 
data  requires  a  large  body  of  knowledge  about  geographic  properties.  Therefore,  more 
attention  should  be  paid  to  knowledge  structure  and  knowledge  processing. 

Within  VPP  databases,  conflation  will  be  performed  on  two  coverage.  The  “cover¬ 
age”  is  a  key  term  that  refers  to  the  set  of  all  geospatial  features  for  which  topological 
-  relationships  have  been  established  within  a  specified  geographic  area.  Arc/Info  defines 
a  coverage  as[15]: 

“It  (a  coverage)  generally  represents  a  single  set  of  geographic  objects  such  as  roads, 
parcels,  soil  units,  or  forest  stands  in  a  given  area.  A  coverage  supports  the  georelational 
model  -  it  contains  both  the  spatial  (location)  and  attribute  (descriptive)  data  for 
geographic  features.” 

By  conflation  process,  differences  in  the  features’  geographic  locations  and  attribute 
values  are  reconciled.  Features  that  don’t  have  corresponding  features  in  the  other 
source  are  identified  and  can  be  added.  In  this  section,  extraction  of  knowledge  from 
the  coverage  is  provided,  corresponding  conflation  algorithms  are  discussed. 
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4.2.1  Conflation  Algorithm  for  Topology 

Precise  vertex-to-vertex  and  line-to-line  matching  were  just  not  possible.  The  topo¬ 
logical  conflation  algorithm  requires  an  available  topology.  Within  the  VPF  databases, 
the  “winged-edge”  format  is  one  of  several  ways  for  representing  full  topology,  and  is 
the  one  specified  for  level-3  coverages.  A  topological  construct  of  the  “winged-edge” 
is  that  each  edge  is  connected  to  two  of  its  neighboring  edges,  a  neighboring  edge  is 
any  edge  that  shares  a  start  or  end  node  with  the  original  edge.  Here,  an  edge  has  a 
start  node,  which  is  connected  to  the  left  edge,  and  an  end  node,  which  is  connected 
to  the  right  edge.  Thus,  the  topological  structure  at  a  particular  area  can  be  regarded 
as  a  network  topology  which  can  be  represented  by  linguistic  variables.  Since  a  nature 
description  of  the  network  topology,  such  as  round  shape,  large  size  and  so  on,  is  much 
better  than  numerical  or  formula  expression,  this  representation  is  more  in  agreement 
with  the  way  a  person  might  describe  some  of  these  attributes. 

Based  on  the  idea  that  uses  the  linguistic  variables  to  interpret  the  characteristics 
of  the  topology,  it  is  possible  to  design  an  intelligent  conflation  algorithm  by  means  of 
fuzzy  logical  and  inexact  reasoning  methods.  However,  when  topology  is  not  available 
in  the  source  datasets,  it  is  necessary  to  build  a  topology. 

4.2.2  Conflation  Algorithm  for  Geometry 

Geometric  distortions  commonly  occur  in  source  data  due  to  imperfect  registration, 
lack  of  geodetic  control,  and  a  variety  of  other  causes.  Rubber  sheeting  will  be  used  to 
correct  flaws  through  the  geometric  adjustment  of  coordinates. 

Assume  that  one  of  the  coverages  between  coverages  is  identified  to.  be  more  spatially 
accurate.  This  coverage  is  referred  to  as  the  reference  coverage.  The  geometry  of  the 
reference  coverage  is  not  modified  during  the  conflation  process.  The  other,  less  spatially 
accurate  coverage  is  transformed  via  rubber-sheeting  to  match  the  reference  geometry 
during  conflation. 

With  vector  GIS,  geographic  features  are  represented  geometrically  by  points,  lines, 
and/or  polygons  in  a  two-dimensional  space.  For  these  features,  there  exist  different 
matching  types  such  as  point-to-point,  point-to-line,  line-to-line  matching,  etc.  shown 
as  in  Figure  4.1. 

Generally,  different  conflation  approaches  use  different  criteria  for  matching  process¬ 
ing.  The  common  geometric  criteria  can  be  briefly  described  as  following: 
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Figure  4.1:  Spatial  Feature  Matching  Types 

Euclidean  distance:  used  to  calculate  the  distance  from  point  to  point,  point  to  line, 
etc. 

Given  two  points  P\{x\,  3/1)  and  P2(x2,  y2 ),  a  mathematical  expression  for  Euclidean 
distance  D  can  be  simply  given  as 

D  =  V(x2  —  ^l)2  +■  (2/2  —  J/i)2-  (4.1) 

Hausdorff  distance:  used  to  calculate  distance  between  two  linear  features  to  search 
for  line-to-line  matches. 

DH  =  max(di,d2).  (4.2) 

Where,  di  denotes  the  largest  minimum  distance  from  line  1  to  line  2  and  d2  is  the 
largest  minimum  distance  from  line  2  to  line  1. 

Fourier  descriptor:  used  for  polygon  shape  feature. 

Fourier  descriptors  axe  a  series  of  Fourier  coefficients  that  define  the  polygon  bound¬ 
ary. 

If  several  matches  are  found  by  distance  criteria,  additional  criteria  such  as  angular 
criteria  may  be  needed.  However,  the  geometric  conflation  method  requires  that  two 
datasets  have  similarity  in  geometric  location.  If  necessary,  rubber  sheeting  may  be 
involved  for  position  alignment. 

The  geometry  conflation  is  carried  out  by  performing  the  following  matching: 

1.  searching  for  node  pairs  between  two  coverages. 
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2.  matched  node  pairs  are  used  to  generate  a  rubber-sheeting  transformation  that 
brings  two  coverages  into  better  alignment  with  the  reference  coverage. 

Note:  Rubber-sheeting  and  node  matching  proceed  iteratively.  Each  iteration 
produces  a  new  transformation  which  brings  the  coverages  into  better  alignment 
possibly. 

3.  Line  matching  proceeds  once  node  matching  has  been  completed. 

4.  The  next  step  is  feature  merging.  The  features  that  have  no  corresponding  feature 
in  the  other  source  will  be  merged. 

Where,  node  matching  is  performed  to  create  rubber-sheeting  transformations.  Dis¬ 
tance  measures  are  used  for  matching  nodes. 

•  Point  to  Point:  Euclidean  distance  matching. 

•  Point  to  Line:  Point  to  line  distance. 

•  Point  to  Polygon:  Point  in  polygon. 

•  Line  to  Line:  Hausdorff  distance  matching. 

•  Line  to  Polygon:  Line  to  polygon  matching  can  be  converted  into  line  to  polygon 
boundary  matching. 

•  Polygon  to  Polygon:  Polygon  centroid,  point  in  polygon,  polygon  shape  feature. 

In  general,  the  conflation  match  results  will  be  more  accurate  when  two  sources  are 
similar. 


4.2.3  An  Intelligent  Algorithm  for  Attribute  Conflation 

The  attributes  associated  with  features  help  to  describe  their  unique  characteristics. 
The  attributes  of  spatial  features  may  have  something  in  common  but  have  different 
semantic  definitions.  In  VPF,  attribute  table  will  collect  the  identically  defined  attribute 
rows  [40].  An  example  of  attribute  table  is  shown  in  Table  4.1. 

Our  attention  to  the  attribute  conflation  is  turned  on  the  intelligent  attribute  match¬ 
ing  algorithm  [7].  For  attribute  matching,  each  feature  object  is  considered  as  a  set  of 
attribute-yalue  pairs,  that  is: 
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Table  4.1:  State  Attribute  Table 


ID 

State 

Area(sq.  mi.) 

Total  Population 

implicit 

character  string 

binary  integer 

binary  integer 

UNIQUE-KEY 

PRIMARY-KEY 

NON-UNIQUE 

NON-UNIQUE 

1 

California 

158706 

26365000 

2 

New  York 

110561 

936000 

3 

Utah 

84899 

1645000 

{  (^11)  I’ll))  (Ol2j  ^12))  •  •  •  ,  (fllri)  ^ln)} 

{  (021.^21),  (022,  v22),  (a2m,  v2m)} 

Where  the  two  categories  of  numeric  and  linguistic  attribute  domains  are  considered 
for  matching.  In  general,  matching  for  numeric  domains  is  handled  through  the  use 
of  membership  matching  functions,  while  matching  for  linguistic  domains  is  handled 
through  the  use  of  attribute  similarity  tables.  For  an  example,  the  linguistic  domain  is 
discussed  as  follows. 

A  similarity  table  for  a  specific  attribute  contains  a  value  in  the  range  [0, 1]  for  each 
attribute  domain  value.  Each  of  these  values  represents  a  degree  of  matching  between 
two  attribute  values.  In  many  cases,  the  domain  values  are  integers  that  represent 
encodings  of  linguistic  characteristics;  thus,  the  similarity  values  in  the  table  represent 
similarity  between  linguistic  terms.  Matching  for  features  based  on  attribute  similarity 
is  a  two-phase  process. 

•  First,  the  similarity  between  each  of  the  attribute  values  for  two  features  is  deter¬ 
mined  from  a  similarity  table. 

•  Second,  measures  of  semantic  interrelationships  between  and  among  the  various 
attribute  values  are  computed  within  a  rule-based  expert  system.  Based  on  these 
interrelationships,  the  expert  system  returns  one  or  more  weights  for  increasing 
or  decreasing  the  matching  score  for  various  attributes. 


Chapter  5 

IMAGE  CHANGE  DETECTION 


The  integration  of  image  data  into  GIS  is  one  of  the  great  ideas  whose  time  has  come. 
As  an  active  research  field  in  GIS,  image  change  detection  is  considered  as  a  potential 
application  of  the  presently  available,  high-resolution  imagery  from  commercial  earth 
observation  satellite.  This  study  will  exploit  remotely  sensed  images  of  the  same  scene 
for  image  change  detection. 

Although  the  image  change  detection  techniques  based  on  the  statistical  theory  have 
been  applied  in  many  applications,  they  do  not  yield  much  useful  information  in  the 
nature  of  the  change.  Because  of  uncertainty  in  the  image  processing,  it  is  more  difficult 
to  make  the  interpretation  of  the  changes.  Normally  it  indicates  that  a  statistically 
significant  change  has  occurred  somewhere  in  the  region  under  examination.  Therefore, 
the  results  of  the  change  analysis  can  be  very  complex  and  unclear  at  a  glance. 

In  the  recent  years,  some  potential  techniques  for  the  new  change  detection  emerge 
from  the  areas  such  as  computer  vision,  image  understanding^],  knowledge-based  sys¬ 
tems,  and  fuzzy  set  theory.  Fuzzy  logic  is  a  way  of  thinking  that  seems  particularly 
appropriate  for  this  purpose.  Some  studies  reported  that  a  rule-based  fuzzy  logic  ap¬ 
proach  gave  better  results  than  a  maximum  likelihood  classifier.  The  new  techniques 
offer  us  some  promise  to  develop  new  algorithms  for  the  image  change  detection. 

By  surveying  current  detection  methods,  most  previous  work  dealt  with  the  use  of 
just  one  change  detection  technique  which  is  applied  to  some  particular  problems  in  a 
particular  area.  In  this  chapter,  a  hybrid  method,  which  takes  the  advantage  of  the 
statistic  analysis  tools  and  fuzzy  theory,  is  investigated  in  order  to  deal  with  a  real-time 
image  change.  The  main  idea  is  the  two-step  change  detection,  that  is,  a  pre-detection 
technique  (such  as  correlation  analysis,  histogram  analysis)  determines  which  of  the 
changes  are  significant,  and  the  post-detection  technique  provides  more  information 
about  how  these  changes  can  be  trusted. 

As  we  know,  to  understand  the  image  in  human  is  the  result  of  reasoning  in  terms  of 
knowledge.  This  knowledge  necessarily  includes  uncertainty  and  fuzziness,  which  reflect 
the  incompleteness  and  imprecision  inherent  in  any  human  knowledge  of  the  real  world. 
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The  main  effort  of  the  dissertation  is  to  investigate  an  inferencing  approach  which  can 
perform  fuzzy  inference  under  uncertainty.  By  introducing  a  Certainty  Factor  (CF),  a 
hierarchical  inferencing  structure  is  proposed.  Also,  the  basic  concepts  related  to  raster 
images  and  fuzzy  sets  are  provided  in  this  chapter. 


5.1  Basic  Concepts  for  Raster  Data 

The  raster  model  is  one  of  the  major  families  of  representation  models  for  image. 
It  divides  the  region  into  rectangular  building  blocks  (grid  cell  or  pixels)  that  are  filled 
with  the  measured  attribute  values. 


Columns 

0  1  . j - - -  n  x 


Figure  5.1:  Discrete  image  I 


Figure  5.1  shows  a  discrete  nxn  image  I  in  the  raster  format.  The  position  of  pixel  (i,  j) 
is  given  in  the  common  notation  for  matrices.  The  first  index,  i,  denotes  the  position 
of  the  row;  the  second,  j ,  the  position  of  the  column  [24].  / \j  represents  the  brightness 
of  the  image  in  the  corresponding  cell  (i,  j). 

Pixels  are  elementary  units.  As  long  as  the  statistical  properties  of  a  pixel  do  not 
depend  on  its  neighbor  pixels,  the  classical  concepts  of  statistics  can  be  applied.  Based 
on  the  raster  model,  we  consider  a  8-bit  gray  image  that  has  been  partitioned  into  n 2 
non-overlapped  blocks  of  equal  dimensions.  Some  measure  of  the  distribution  of  the 
pixel  values  in  an  image,  and  a  set  of  image  features  such  as  edge,  shade  and  mixed- 
range  can  be  defined  as  follows. 
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5.1.1  Common  Statistic  Measures 

An  image  can  be  regarded  as  a  function  I :  Rn  t-*  Rm,  where  normally  n  is  2  and  m 
is  1  for  intensity  or  3  for  color.  If  m  =  1,  the  value  kj  is  called  the  gray  level  of  pixel 
(ij).  If  m  >  1,  hj  is  referred  to  a  feature  vector.  In  this  study,  the  gray  values  will  be 
considered. 

Considering  that  a  pixel  is  a  random  variable  as  any  other  measured  quantity,  first- 
order  statistics  such  as  mean  values,  and  the  variance,  or  standard  deviation  [38,  39] 
can  be  defined  as  follows. 


Definition  5.1.1.  The  Mean  of  the  pixel  values  in  an  image,  Iavg,  is  given  by 

^av9  —  ~2  T.  Ijr  (5-1) 


rr 


Definition  5.1.2.  The  Standard  deviation  of  the  pixel  values  in  an  image,  o2,  is  given 
by 


^2  =  ^X>-w>2- 

ij 


(5.2) 


The  standard  deviation,  or  more  generally,  the  second  moment,  is  one  measure  for  the 
degree  of  smoothing.  This  measure  can  be  applied  in  the  spatial  domain. 


Definition  5.1.3.  Correlation  R.  Given  a  pair  of  images  of  the  same  area  acquired  at 
different  times.  Correlation  starts  by  computing  the  correlation  coefficient  for  a  local 
neighborhood  of  n  x  n  pixels.  The  value  of  correlation  coefficient  R  can  be  calculated 
as: 


i~l  j— 1 


(ai,j  '  aavg){bitj  bavg ) 


n2oacrb 


(5.3) 


where,  a*,*  and  6*,*  are  pixel  intensity  values  in  the  first  and  second  images; 
O/avgi  Oa,  bavg  j  <76  are  mean  and  standard  deviation  of  the  intensity,  respectively. 


Definition  5.1.4.  Image  histogram ,  h(Iij).  A  histogram  of  an  image  is  a  list  which 
contains  as  many  elements  as  quantization  levels.  In  each  element,  the  number  of  pixels 
is  stored  that  show  the  corresponding  gray  value.  The  domain  of  the  histogram  is  the 
set  of  possible  pixel  gray  values  (quantization  levels). 
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Individual  image  data  are  typically  quantized  with  brightness  values  ranging  from 
28  to  212.  The  majority  of  current  data  are  quantized  as  8-bits,  with  values  ranging 
from  0  to  255  [25]. 

Thus,  a  histogram  can  be  also  defined  as  a  frequency  distribution  graph  of  a  set  of 
numbers,  that  is  a  graph  with  “pixel  value  (or  brightness)”  on  the  horizontal  axis  from 
0  to  255  and  the  “number  of  pixels”  on  the  vertical  axis. 


Tabulating  the  frequency  of  occurrence  of  each  brightness  value  within  the  image 
provides  the  statistical  information  that  can  be  displayed  graphically  in  a  histogram. 
Figure  5.2  shows  an  example  to  calculate  and  plot  the  histogram  of  an  image. 
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Figure  5.2:  An  1-D  histogram 


5.1.2  Image  Features 

The  extraction  of  Image  feature  requires  neighborhood  operators  that  are  sensitive 
to  changes  of  constant  gray  values.  The  base  of  image  feature  extracting  is  differentia¬ 
tion.  In  discrete  images,  differentiation  has  to  be  approximated  by  differences  between 
neighboring  pixels. 

Definition  5.1.5.  An  edge  is  a  contour  of  pixels  of  large  gradient  with  respect  to  its 
neighbors  in  an  image. 

Definition  5.1.6.  A  shade  is  a  region  over  an  image  with  a  small  or  no  variation  of 
gray  levels. 
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Definition  5.1.7.  A  mixed  range  is  a  region  excluding  edges  and  shades  on  a  given 
image. 

Definition  5.1.8.  The  gradient  [46]  at  a  pixel  (i,j)  in  an  image  is  estimated  by  taking 
the  square  root  of  the  sum  of  difference  of  gray  levels  of  neighboring  pixels  with  respect 
to  the  pixel 

= 3(  E  -  G<>))! + £<  E  -  G«»2-  («) 

k—i—1  k=j — 1 

Definition  5.1.9.  The  gradient  median  Gmedi  within  a  partitioned  block  is  defined 
as  the  middle  gradient  values  in  that  block  when  all  gradient  values  are  arranged  in 
ascending  or  descending  order.  It  is  another  measure  of  central  tendency. 

Definition  5.1.10.  The  gradient  average  Gavg  within  a  block  is  defined  as  the  average 
of  the  gradient  of  all  pixels  within  that  block, 

Gavg  =  G(i,  j)  •  p(Gij),  (5-5) 

where  Gy  denotes  the  gradient  values  at  pixels,  and  P(Gy)  represents  the  probability 
of  the  particular  gradient  Gy  in  that  block  [62]. 

Definition  5.1.11.  The  variance  a2  of  the  gradient  is  defined  as  the  arithmetic  mean 
of  square  of  deviation  from  mean.  It  is  expressed  formally  as, 

<r2  =  J2(Gy  -  Gavgf  ■  P(Gy).  (5.6) 


5.2  Basic  Concepts  of  Fuzzy  sets 

To  deal  with  non-exact  problems  is  a  wider  part  of  human  experience.  In  1965, 
Zadeh  introduced  the  idea  ‘fuzzy  sets’  to  deal  with  inexact  concepts  in  a  definable 
way.  Since  1960s,  the  theory  of  fuzzy  sets  has  been  developed  to  the  point  where 
useful,  practical  tools  are  available  for  use  in  other  disciplines.  Fuzziness  is  often  a 
concomitant  of  complexity.  It  is  appropriate  to  use  fuzzy  sets  whenever  we  have  to  deal 
with  ambiguity,  vagueness  and  ambivalence  in  mathematical  or  conceptual  models  of 
empirical  phenomena. 
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5.2.1  Fuzzy  Sets 

‘  If  X  is  a  collection  of  objects  denoted  generally  by  x,  then  a  fuzzy  set  A  in  X  is 
defined  as  a  set  of  ordered  pairs: 

A  =  {(x,  fj,A(x) \xeX},  (5.7) 

where  Ha(x)  is  called  the  membership  function  (or  MF  for  short)  for  the  fuzzy  set  A.  It 
denotes  the  degree  of  membership  of  a  variable  x  to  belong  to  A,  where  A  is  a  subset 
of  a  universal  set  X.  Usually,  X  is  referred  to  as  the  universe  of  discourse,  or  simply 
the  universe. 


5.2.2  Membership  functions 

A  fuzzy  set  is  completely  characterized  by  its  membership  function  (MF) .  Put  simply, 
in  fuzzy  sets,  the  grade  of  membership  is  expressed  in  terms  of  a  scale  that  can  vary 
continuously  between  0  and  1.  A  more  convenient  and  concise  way  to  define  the  MF  is 
to  express  it  as  a  mathematical  formula. 

Due  to  the  smoothness  and  concise  notation,  Gaussian  function  is  becoming  increas¬ 
ingly  popular  for  specifying  a  fuzzy  set.  Moreover,  Gaussian  functions  are  well-known 
in  probability  and  statistics,  and  they  possess  useful  properties  such  as  invariance  under 
multiplication  and  Fourier  transform. 

Definition  5.2.1.  A  Gaussian  MF  is  specified  by  two  parameters,  {c,  < r},  i.e. 

Gaussian(x\  c,  cr)  =  e"^2^)2 ,  (5.8) 

where  c  represents  the  MFs  center  and  a  determines  the  MFs  width. 

Figure  5.3  plots  a  Gaussian  MF  defined  by  Gaussian(x;5,2). 


5.2.3  Set- Theoretic  Operations 

Union  and  intersection  are  the  most  basic  operations  on  classical  sets.  Correspond¬ 
ing  to  the  ordinary  set  operations  of  union  and  intersection,  fuzzy  sets  have  similar 
operations,  which  were  initially  defined  in  Zadeh’s  seminal  paper [29]. 
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Figure  5.3:  Gaussian  membership  function 


Definition  5.2.2.  Union  (disjunction) 

The  union  of  two  fuzzy  sets  A  and  B  is  a  fuzzy  set  (7,  written  asC-AUB  or  C  —  A 
OR  B,  whose  MF  is  related  to  those  of  A  and  B  by 

Hc(x)  =  (5.9) 

Definition  5.2.3.  Intersection  (conjunction) 

The  intersection  of  two  fuzzy  sets  A  and  B  is  a  fuzzy  set  C,  written  as  C 
C  =  A  AND  B ,  whose  MF  is  related  to  those  of  A  and  B  by 

Pc(%)  = 

5.2.4  Linguistic  Variables  and  Other  Related  Terminology 

Knowledge  in  the  fuzzy  model  is  represented  by  a  linguistic  variable,  which  is  char¬ 
acterized  by  fuzzy  sets.  A  linguistic  variable  is  characterized  by  a  quintuple  [23], 

(x,  T(x),X,  NR,  MR) , 

where 

x  is  the  name  of  the  variable; 

T(x)  is  the  term  set  of  x,  that  is,  the  set  of  its  linguistic  values  or  linguistic  terms; 

X  is  the  universe  of  discourse; 

NR  is  a  syntactic  rule  which  generates  the  terms  in  T{x)\  and, 

MR  is  a  semantic  rule  which  associates  with  each  linguistic  value. 


=  A  n  B  or 
(5.10) 
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For  example,  an  image  can  be  interpreted  as  a  linguistic  variable,  then  its  term  set 
T(image)  could  be  expressed  as, 

T (image)  =  {edge,  shade,  mixed  range}, 

where  each  term  in  T (image)  is  characterized  by  a  fuzzy  set  of  a  universe  of  discourse 
X. 

Usually  we  use  “Image  includes  edges”  to  denote  the  assignment  of  the  linguistic 
value  “edge”  to  the  linguistic  variable  “image”.  The  syntactic  rule  refers  to  the  way  the 
linguistic  values  in  the  term  set  T(image)  are  generated.  The  semantic  rule  defines  the 
membership  function  of  each  linguistic  value  of  the  term  set. 


5.3  A  Hybrid  Approach  for  Image  Change  Detection 

It  is  true  that  each  detection  technique  has  its  own  merits  and  demerits.  For  practical 
applications,  a  reliable  approach  depends  on  both  of  the  measured  quantity  and  its 
capability  to  handle  uncertainty.  The  research  is  dedicated  to  develop  a  more  flexible 
and  reliable  method  for  image  change  detection  based  on  satellite  images.  A  two-step 
processing  method  is  designed,  i.e.  pre-detection  and  post-detection. 

The  pre-detection  is  a  process  in  which  detecting  changes  can  be  analyzed  from  the 
standpoint  of  statistical  decision  theory.  And  the  post-detection  is  a  process  which 
adopts  fuzzy  set  theory  to  make  a  decision. 

5.3.1  Pre-detection  —  A  Statistic  Analysis 

From  the  statistic  point  of  view,  some  basic  characteristics  for  change  detection 
techniques  can  be  described  as  follows: 

•  As  usual,  the  mean  of  a  distribution  is  used  as  a  measure  of  the  distribution’s 
location.  The  estimate  of  the  mean  gray  value  of  the  region,  averaging  within  a 
local  neighborhood,  appears  to  be  a  central  tool  for  region  detection.  However, 
the  mean  is  a  poor  measure  of  central  tendency  when  the  set  of  observations  is 
skewed  or  contains  an  extreme  value. 

•  The  standard  deviation  is  a  measure  of  the  absolute  dispersion.  A  small  standard 
deviation  suggests  that  pixel  values  are  clustered  tightly  around  a  central  value. 
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demonstrates  there  is  no  relationship  between  two  sets.  Therefore,  there  is  a  high 
correlation  between  image  data  in  the  regions  that  have  not  changed  significantly, 
and  a  relatively  low  correlation  between  regions  that  have  changed  substantially. 
However,  correlation  analysis  generally  fails  to  detect  a  consistent  change.  Espe¬ 
cially  if  images  are  acquired  under  different  illumination  conditions. 

3.  Gray  Scale  Normalization 

Changes  in  the  intensity  at  pixels  occur  for  many  reasons  such  as  noise,  illumi¬ 
nation  variations.  It  is  often  necessary  to  normalize  the  gray  scale  of  an  image 
in  order  to  improve  the  reliability  of  feature  detection  or  measurement.  For  in¬ 
stance,  suppose  that  we  want  to  compare  two  images  I\  and  /2  in  order  to  detect 
differences  between  them.  If  the  images  were  obtained  under  different  lighting 
conditions,  we  must  somehow  compensate  for  this,  since  otherwise  they  will  have 
different  gray  levels  at  every  point  even  if  they  are  images  of  the  same  scene. 
A  common  method  of  gray-scale  normalization  is  to  standardize  the  image  his¬ 
togram,  i.e.  the  gray  level  frequency  distribution.  By  normalization,  an  image  is 
brought  into  a  standard  form. 

Consider  two  images  A  and  /2,  having  n2  points.  Suppose  that  I\  is  quantized  to 
k  levels  l\  I2,  .  ■ .  h,  h  is  quantized  to  k  levels  l\  I2 ,  ...  h,  and  level  k  and  U  have 
rrii  points.  Then,  if  the  illumination  variations,  and/  or  white  noise  is  considered, 
the  following  relationship  should  be  satisfied: 

Since 


n2 

1=1 

(5.14) 

Let 

1  ~  • 

i— 1 

(5.15) 

li  ”  l{  “f*  Qi) 

(5.16) 

where  a  is  a  constant. 
Then, 


/2  =  /i+a,  (5.17) 
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0\  —  cr2. 


(5.18) 


The  normalization  of  histogram  h2  of  image  /2  can  be  computed  by: 

h2  =  h2-  a.  (5.19) 

In  this  way,  the  normalization  of  the  gray  scale  makes  feature  values  independent 
of,  or  at  least  insensitive  to  noise  and  illumination. 

Based  on  the  above  information,  the  pre-detection  technique  can  be  divided  as: 

Step  1:  Correlation  Analysis  for  Small  Change 

The  correlation  R  is  used  to  analyze  the  small  change  in  the  given  area.  A  simple 

rule  for  judgment  can  be  given  as: 

if  the  correlation  R  E  (0.9, 1],  no  change  is  considered. 

if  the  correlation  R  E  (0.5, 0.9],  some  changes  may  happen. 

Otherwise,  changes  occur. 

Step  2:  Histogram  Analysis  for  Structure  Change 

Due  to  the  poor  quality  of  correlation  analysis  to  detect  structure  change,  and  also 
gray  scale  under  or  overflow  is  a  common  error  which  often  goes  unnoticed  and 
causes  a  serious  bias  in  processing,  the  further  step  for  image  change  detection  is 
needed. 

Since  the  histogram  of  an  image  can  clearly  illustrate  contrast  and  multmodal  in 
nature,  that  is,  each  element  in  the  histogram  tends  to  be  comprised  of  gray  levels 
different  from  one  another,  the  histogram  can  be  used  as  a  useful  and  important 
graphical  aid  to  understand  the  information  content  of  a  satellite  image. 

Firstly,  the  histograms  of  images  are  normalized  in  order  to  compare  them.  Sec¬ 
ondly,  based  on  their  gray  level  composition,  the  number  of  the  peaks  in  the 
histogram  is  counted.  Since  each  peak  corresponds  to  a  dominant  type  of  the 
objects  in  the  image,  finally,  the  statistic  structure  change  can  be  derived  by 
comparing  the  number  of  the  peaks  in  two  images. 

Let  NHi  and  NH2  be  the  number  of  the  peaks  in  images  ip  and  /2,  respectively. 
If  NHi  7^  NH2,  the  changes  may  occur,  especially  in  structure. 
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Like  all  processing  of  data,  although  the  histogram  of  an  image  can  provide  useful 
information  about  how  to  detect  the  structure  change,  there  is  a  drawback  or  undesirable 
artifact  to  this  histogram  technique.  Objects  in  a  scene  may  be  composed  of  gray  level 
regions  that  overlap,  objects  will  have  portions  that  fall  in  another’s  classification. 
Specially,  when  an  unusually  large  number  of  pixels  have  the  same  brightness  value, 
or  several  objects  have  the  same  brightness  value,  the  traditional  histogram  display 
may  not  be  the  best  way  to  represent  the  information  content  of  an  image.  Therefore, 
the  histogram  structure  change  detection  is  basic.  It  works  best  on  simple  scenes  with 
objects  that  have  distinctly  different  gray  scale  occupancies. 

5.3.2  Post-detection  -  An  Intelligent  Analysis 

Since  it  is  difficult  to  state  exact  processing  for  image  change  detection  in  precise 
mathematical  terms,  most  of  image  detection  algorithms  are  heuristic  [18].  How  to  judge 
these  changes  and  give  a  reliable  information  for  image  change  is  mainly  concerned. 
As  was  pointed  out  by  Zadeh  [30],  conventional  techniques  for  system  analysis  are 
intrinsically  unsuited  for  dealing  with  humanistic  systems,  whose  behavior  is  strongly 
influenced  by  human  judgment,  perception  and  emotions.  Even  though  the  basic  change 
detection  can  be  carried  out  based  on  statistic  analysis,  the  results  can  not  be  totally 
trusted.  This  is  because  geographical  information  involves  human  interpretation  and 
knowledge,  which  are  invariably  imprecise,  incomplete,  or  not  totally  reliable.  For 
dealing  with  these  situations,  human  beings  are  highly  skilled  at  making  decision  in  the 
uncertain  environment. 

Zadeh  proposed  an  alternative  approach  to  modeling  human  thinking.  This  ap¬ 
proach  serves  to  summarize  information  and  express  it  in  terms  of  fuzzy  sets.  Recently, 
amount  of  approaches  for  image  processing  that  use  fuzzy  membership  functions  have 
been  developed.  A  new  methodology[4,  45]  that  uses  fuzzy  membership-distance  as  mo¬ 
ment  descriptors  for  image  matching  gives  us  a  clue  to  design  our  intelligent  detection 
approach,  since  it  is  superior  to  all  other  existing  techniques  for  inexact  reasoning.  The 
intelligent  detection  is  provided  as  follows. 

1.  Image  as  a  Fuzzy  Subset  T(image) 

Geographical  information  (including  satellite  data)  is  often  imprecise,  meaning 
that  the  boundaries  . between  different  phenomena  are  fuzzy.  Exact  definitions  are 
inadequate  for  dealing  with  geographic  information.  Mathematically  speaking,  an 
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ideal  edge  is  a  discontinuity  of  the  spatial  gray  value  function.  It  is  obvious  that 
this  is  only  an  abstraction  which  often  does  not  meet  the  reality.  For  example, 
there  is  usually  a  gradual  interface  at  the  edge  of  forests  and  range  land,  in 
geographic  space  it  is  difficult  to  define  a  natural  boundary  which  is  sharp  and 
well  determined.  Most  geographic  objects  seem  to  be  an  abstraction  of  things 
which  have  unclear,  fuzzy  boundaries.  Therefore,  the  interpretation  of  an  image 
should  contain  fuzzy  definitions  [43,  59].  Currently  there  are  no  available  fuzzy 
schemes  to  define  the  fuzzy  subsets  for  image  representation.  This  brings  us  to 
an  important  consideration  -  to  interpret  the  significance  of  an  image. 

The  extraction  of  image  features  requires  to  use  knowledge  of  image  processing. 
Generally,  the  partitioned  blocks  in  an  image  include  either  edge  and  shades  to¬ 
gether  or  mixed  area.  Simply,  an  image  can  be  interpreted  as  a  fuzzy  variable 
that  has  the  primary  term  set  {shade,  edge,  mixture  area}.  Thus,  the  term  set  of 
an  image  can  be  expressed  as: 

T (image)  =  {edge,  shade,  mixed  area}  . 

This  means  that  each  block  can  be  identified  according  to  three  possible  charac¬ 
teristics  named  ‘edge’,  ‘shade’  and  ‘mixed  area’. 

2.  Fuzzy  Membership  Functions 

In  order  to  make  the  change  detection  more  sensitive  to  imprecise  (fuzzy)  nature  of 
the  real  world,  the  problem  of  determining  the  appropriate  membership  function 
drew  the  attention  of  many  researchers  in  the  image  processing  field. 

Prom  the  statistical  point  of  view,  we  can  assume  that  each  linguistic  value,  such 
as  shade,  edge,  mixed  area,  corresponds  to  a  fuzzy  set  whose  membership  function 
H  is  represented  by  a  Gaussian  function  shown  as  Equation  5.8.  The  degree  of 
membership  of  a  given  block  to  contain  typical  sub-classes  (edges,  shades  and 
mixed  areas)  is  measured  subsequently  by  means  of  basic  statistic  measures,  for 
example,  average  gradient,  variance,  and  median  of  gradients.  A  more  challenging 
problem  is  to  determine  the  two  parameters  in  the  Gaussian  membership  function. 
There  are  no  any  general  approaches.  The  definition  of  the  grades  of  memberships 
is  subjective  and  depends  on  the  human  interpretation. 
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Assume  that  objects  are  characterized  by  constant  gray  values.  If  the  pixel  shows 
the  same  gray  value  as  its  neighbors,  there  are  good  reasons  to  believe  that  it  lies 
within  a  region  of  constant  gray  values.  Let  us  consider  the  following  cases. 

Case  1:  Shade  Membership  Function  fahade 

When  an  image  only  contains  shades,  the  ideal  gradient  average  Gavg  should  be 
zero.  By  Definitions  5.1.2,  the  gradient  median  Gmedi  and  the  variance  of  gradient 
<r2  are  also  zero  ideally.  It  is  intuitive  that  the  degree  of  membership  of  the  image 
to  belong  to  “shade”  will  decrease  when  Gavg  increases.  Based  on  these  analysis, 
it  is  reasonable  to  presume  the  shade  membership  function  with  respect  to  the 
gradient  average  fiShade(GaVg)  as 

Pshade(Gavg)  =  (5.20) 

where  is  a  constant,  which  can  be  determined  by  boundary,  conditions. 

In  the  same  way,  the  shade  membership  function  with  respect  to  parameter  Gmedi 
and  c  can  be  realized  with  a  similar  form 


f^shade(Gmedi )  —  6  s2  msdtj 

(5.21) 

/J’shade(&')  “6  *3  > 

(5.22) 

where  As2  and  As3  are  constant  and  determined  by  boundary  conditions. 

Case  2:  Edge  Membership  Function  fxedge 

By  Definition  5.1.2,  an  edge  in  an  image  is  a  contour  of  pixels  of  large  gradient. 
Analogously,  the  average  of  gradient  in  such  an  image  containing  boundaries  will 
have  a  positive  value  £.  How  big  £  is  will  determine  the  center  of  the  distribution. 
It  is  difficult  to  pre-define  this  value.  A  dynamic  approach  should  be  considered. 
Let  £  be  the  gradient  average  of  a  reference  image,  the  edge  membership  function 
with  respect  to  Gavg  can  be  represented  as 

Pedge(Gavg)  =  (5.23) 

where  Aei  is  a  constant. 

Regarding  to  Gmedi  and  a  parameters,  the  edge  membership  functions  are  respec¬ 
tively 

/W(Gmedi)  =  (5.24) 
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Hedged)  =  &  Ae3^  ^ , 


(5.25) 


where  Ae 2  and  Ae3  are  constant,  also  determined  by  boundary  conditions. 


Case  3:  Mixed-area  Membership  Function  /umixed 

For  an  image  containing  mixed  area,  there  are  no  definite  parameters  such  as  Gavg, 
Gmedi  and  <7  that  can  be  predicted.  The  parameters  for  such  an  image  depend 
on  the  type  and  pattern  of  the  mixed  area.  However,  it  can  be  easily  ascertained 
by  means  of  fuzzy  set  theory.  According  to  the  fuzzy  set  theory,  at  each  point, 
the  total  degree  should  be  one.  For  example,  let  the  gradient  average  Gavg  be  a 
parameter,  the  following  constraint  should  be  satisfied: 


P  shade ( G avg)  fd>edge(G avg)  T*  f^mixed(G avg)  !• 


(5.26) 


Therefore,  the  mixed-area  membership  function  can  be  derived  from  the  shade 
and  edge  membership  functions,  that  is, 


Pmixed(G  avg)  1  Pshade(G  avg)  Pedge(Gavg ), 


(5.27) 


M' mixed(Gmedi )  1  P shade  (Gmedi)  Pedge(G medi) , 


(5.28) 


Pmixed(&)  1  Pshade(&)  Pedge(&)' 


(5.29) 


The  constants  A*  can  be  determined  by  the  boundary  conditions.  That  is, 


(a)  When  /J.shade( 0)  =  1,  it  is  a  pure  shade-area,  and  then  fiedge(0)  =  0. 

(b)  When  fj,edge(0  =  1,  it  is  a  boundary  field,  and  then  fishade(0  =  0. 


It  may  be  noted  that  the  Gaussian  membership  function  is  zero  only  when  its 
parameter  tends  to  be  infinite.  In  practice,  a  threshold  e  is  assigned,  where  e  is 
a  very  small  positive  number.  Consequently,  by  solving  the  boundary  condition 
equations: 


/  hshade(O)  —  1  f  l-ledge  (0  —  1 

\  Pedge(H)  =  (  Pshade(£,)  =  €• 

Then,  the  constants  can  be  obtained  by 


Ael  =  A 


In  e 


(5.30) 


(5.31) 
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So,  the  membership  functions  with  parameter  Gavg  can  be  written  as, 

Vshade(Gavg)  =  e^GLs,  (5.32) 

Hedge(Gavg)  =  (5.33) 

Pmixed(Gavg)  =  1  -  e (5.34) 


Given  e  =  0.01  and  £  =  128,  one  sample  of  the  membership  function  distribution 
of  an  image  containing  edge,  shade  or  mixed  area  against  Gavg  are  illustrated 
in  Figure' 5.4  From  the  figure,  it  can  be  clearly  seen  that  an  image  having  a 


Figure  5.4:  Membership  functions  with  gradient  average 

particular  value  Gavg  may  be  said  to  contain  edge,  shade  or  mixed  area  but  with 
different  degrees  of  membership. 

3.  Fuzzy  Moment  (FM)  Descriptors 

By  taking  the  idea  of  fuzzy  membership-distance  from  [4,  45],  we  define  fuzzy 
moments  between  two  images  to  evaluate  the  degree  of  the  differences.  A  definition 
is  given  as  follows. 

.Definition  5.3.1.  Fuzzy  shade  moment  [FM%}shade)  is  defined  as  the  Euclidean 
distance  of  the  membership  values  between  two  corresponding  images  /i  and  I 
That  is, 


[FMjf]  shade  —  libsLde  ^shadeW' 


(5.35) 
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Simply,  it  can  be  rewritten  as: 

[FMsha.de  =  W^shade  ~  vlhadeW'  (5.36) 

Fuzzy  edge  and  mixed  moments  can  be  obtained  using  this  definition  with  only 
replacement  of  the  term  ‘shade’  by  appropriate  features. 

4.  Fuzzy  Uncertainty  Inference 

Now,  how  does  our  intelligent  approach  perform  image  change  detection?  Since 
image  understanding  in  human  beings  is  also  the  result  of  reasoning  in  terms  of 
knowledge  about  the  world,  uncertainty  and  fuzziness  are  inevitable  problems.  A 
fuzzy  inferencing  with  uncertainty  will  be  required. 

In  general,  the  fuzzy  inference  with  uncertainty  can  be  expressed  as  a  IF-THEN 
fuzzy  product  rule: 

IF  <Evidence>  THEN  <Strategy>  ( CF) 

where  CFe  [0, 1]  is  a  certainty  factor  indicating  the  certainty  with  which  a  fact 
is  believed.  A  certainty  factor  of  one  indicates  that  it  is  very  certain  that  a  fact  is 
true,  and  a  certainty  factor  of  zero  indicates  that  it  is  very  uncertain  that  a  fact 
is  true. 

Uncertainty  occurs  when  one  is  not  absolutely  certain  about  a  piece  of  informa¬ 
tion.  Also,  uncertainty  is  referred  as  to  the  lack  of  adequate  or  correct  infor¬ 
mation  to  make  decision.  Considered  GaVg  as  a  parameter,  three  fuzzy  moments 
will  be  associated  with  change  detection,  i.e.,  [FM{Gavg)\shade,  [FM(Gavg)}edge> 
[FM(GaVg)]mixed).  Again,  if  all  three  parameters  are  considered,  there  are  nine 
fuzzy  moments.  How  do  we  make  the  final  decision  according  to  these  moments? 
This  reveals  important  deficiencies  in  areas  such  as  the  ability  to  defect  inconsis¬ 
tencies  in  the  knowledge.  In  order  to  make  a  reliable  decision,  a  key  factor  CF 
will  be  used  to  evaluate  the  degree  of  certainty.  Some  key  ideas  relevant  to  the 
determination  of  CF  are  discussed  as  follows. 

•  Detecting  change  based  on  shade  feature 
This  kind  of  detection  will  provide  area  change  information.  And  the  decision 
will  particularly  depend  on  the  fuzzy  shade  moments,  i.e.,  [FM(Gavg)]shade, 
l FM(Gmedi)}shade ,  and  [FM(<x)]  shade * 
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Notice  that  CF  evaluates  the  certainty  to  believe  that  image  change  occurs 
and  fuzzy  moment  indicates  the  degree  of  the  difference.  The  larger  the  fuzzy 
moment,  the  more  certain  the  change  occurring.  Based  on  these  facts,  it  is 
acceptable  to  take  the  moment  as  a  CF.  For  example,  we  can  say: 

The  image  area  change  has  been  detected  based  on  the  gradient  average  with 
certainty  CFshade(Gavg),  where 

CF3hade(Gavg)  =  [FM(Gavg)\ 

shade  *  (5.37). 

•  Detecting  change  based  on  edge  feature 

This  kind  of  detection  will  associate  with  the  boundary  changes.  The  main 
factors  includes  the  fuzzy  edge  moments  such  as  {FM(Gavg)}edge, 
[FM(Gmedij\edge,  and  [FM {o^edge- 

The  image  boundary  change  has  been  detected  based  on  the  gradient  average 
with  certainty  CFedge(Gavg),  where 

CFedge(Gavg)  =  [FM(Gavg)]edge.  (5.38) 

•  Detecting  change  based  on  mixed-area  feature 

From  the  definition  of  the  mixed  area,  it  has  been  realized  that  “mixed”  is 
a  very  fuzzy  concept.  It  is  not  easy  to  make  a  decision  by  only  using  mixed- 
area  feature  information.  It  should  be  combined  with  shade  and/or  edge 
feature  information. 

Based  on  the  following  consideration: 

(a)  Combining  the  shade  feature  with  mixed-area  feature  information  (fuzzy 
moment),  or  the  edge  with  mixed-area  feature  information  provides  a  way  for 
intelligent  reasoning.  In  addition,  the  edge  and  shade  features  are  considered 
as  two  independent  features. 

(b)  The  mean  and  deviation  are  two  important  measures  from  the  statistic  point 
of  view.  One  is  for  measuring  the  distribution-  of  location,  another  is  for  the 
dispersion  from  mean.  Single  measures  will  lead  to  a  bias  in  decision  making. 
They  should  be  considered  simultaneously. 
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A  hierarchical  fuzzy  inference  model  can  be  designed  in  the  manner  shown  in 
Figure  5.5.  Level  1  performs  a  fuzzy  inference  which  is  associated  with  statistic 
measures.  An  advanced  fuzzy  inference  with  respect  to  image  features  will  be 
carried  out  at  Level  2. 

Table  5.1  shows  the  groups  of  the  moments  for  determination  of  CF  based  on  two 
statistic  measures. 


Table  5.1:  Groups  of  the  Fuzzy  Moments  for  CF 


CF 

[FM(a)]shade 

[FM(a)}edge 

[FM(a)]mixed 

[FM  ((?  avg)\shade 

GF}h  ade 

[F  M  (Grnedi)}  shade 

CFLd. 

[■ FM(Gavg)]edge 

OF}dl, 

\F  M  (Gmedij\edge 

Of%„ 

M  (Ga^g)]  mixed 

CF1 

^ 1  mixed 

[FM  (Gmedi)]  mixed 

: 

mixed 

It  is  easy  to  understand  that  the  relationship  between  different  groups  is  disjoint, 
and  the  relationship  in  the  same  group  is  conjunction.  According  to  the  fuzzy 
set  theory,  the  conjunction  and  disjunction  can  be  respectively  defined  as  mini¬ 
mum  and  maximum  of  the  involved  facts.  Therefore,  the  certainty  factors  can  be 
determined  by  the  following  formulas: 

G Fshade  mhl-f  [F M [A M (G aVgy\shade} j 

CFshade  =  min{[FM{a)]shade,  [FM(Gmedi)]shade},  (5.39) 

GFledge  =  mm{[FM(a)\edge,  {FM(Gavg)}edge}, 

GFldg&  =  mm{[FM(a)}edge,{FM(Gmedi)]edge},  (5.40) 
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CFmixed  =  min{ [FM (a )}mixed ,  {FM(Gavg)}mixed}, 

CFFxed  =  min{[FAf(o-)]  rriixed)  [F M(^G Tnedi^mixed} •  (5.41) 

Therefore,  the  certainty  factors  for  detecting  feature  changes  can  be  given: 

CF 'shade  “  Kiax{C Flhade} > 

CFedge  =  ma x{OF^e}} 

CFmixed  =  max{C,i^iiced},  (5.42) 

where  i  =  1, 2. 

Up  to  now,  a  statistic- based  fuzzy  inference  has  been  figured  out.  Since  the  mixed 
area  is  related  to  both  shade  and  edge,  an  advanced  fuzzy  inference  can  be  carried 
out  by  integrating  the  mixed-area  information  in  shade  or  edge  change  detection. 
More  details  are  given  as  follows. 

A  certainty  factor  for  shade  change  detection  can  be  calculated 

CF(shade)  =  aCFshade  +  (1  -  a)CFmixed.  (5.43) 

In  the  same  way,  a  certainty  factor  for  boundary  change  is 

CF(edge)  =  (3CFedge  +  (1- P)CFmixed,  (5.44) 

where  a.  and  0  are  weights,  in  practice,  they  are  determined  by  experience. 

Finally,  the  certainty  factor  with  which  the  image  changes  have  been  detected  can 
be  obtained  by 


CF  =  max{CF (shade),  CF (edge)}.  (5.45) 

The  certainty  factor  CF  indicates  how  many  degrees  image  changes  can  be  trusted. 
Consequently,  the  larger  the  certainty  factor  is,  the  higher  the  degrees  is. 


Chapter  6 

EVALUATION  AND  RESULT  ANALYSIS 


Detecting  and  representing  changes  to  data  is  important  for  active  GIS  databases. 
The  previous  chapter  focuses  on  developing  the  change  detection  algorithm  by  utilizing 
satellite  images.  In  this  chapter,  the  theoretical  analysis  will  be  provided.  The  further 
evaluation  is  investigated  by  using  real  images.  Finally,  the  hybrid  change  detection 
approach  is  summarized. 

6.1  Theoretical  Analysis  of  Image  Change  Detection 

For  theoretical  evaluation,  a  simple  image  with  regular  object  shapes  will  be  considered. 
Histogram  Analysis 

Given  an  image  with  five  objects  that  have  regular  shape  shown  in  Figure  6.1(al). 
Different  colors  are  just  used  to  demonstrate  different  gray-scale. 

Histograms  can  be  calculated  straightforwardly.  A  simple  process  can  be  summa¬ 
rized  as  follows: 

•  set  the  whole  list  to  zero; 

•  scan  all  pixels  of  the  image,  and  take  the  gray  value  as  the  index  to  the  list; 

•  increment  the  corresponding  element  of  the  list  by  one. 

The  corresponding  raster  model  and  plot  of  histogram  are  shown  in  6.1(a2)  and 
6.1(a3),  respectively.  Elementary  classification  of  objects  within  the  image  scene 
is  clearly  illustrated  in  Figure  6.1(a3).  Structure  change  can  be  carried  out  by 
comparing  Figure  6.1(a)  and  6.1(b).  Moreover,  the  deficiency  to  detect  two  objects 
with  the  same  gray  values  is  also  illustrated  in  Figure6.1(a)  and  6.1(c). 

This  simple  analysis  shows  that  the  histogram  analysis  fails  to  detect  the  change 
when  objects  have  the  same  brightness  values.  Actually,  the  statistical  significance 
of  a  result  is  the  probability  that  a  difference  occurred.  In  order  to  provide 
additional  information,  an  advanced  change  detection  should  be  performed. 
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Figure  6.2:  Membership  functions  for  theoretical  analysis 

3.  For  the  image  /3  in  Figure  6.1  (cl),  Gavg(h )  =  47.216,  a(I3)  —  33.129,  and 
Gmedi(h )  =  56.23. 

Let  the  image  I\  be  a  reference  image  or  base  image.  Given  e  =  0.01.  Then  the 
membership  functions  that  describe  an  image  contains  edge,  shade  or  mixed  area 
against  mean  Gavg  and  standard  deviation  a  are  illustrated  as  Figure  6.2.  The 
fuzzy  uncertainty  inference  structure  is  shown  as  in  Figure  6.3  by  comparing  the 
image  L  and  image  /2.  Figure  6.4  shows  the  fuzzy  uncertainty  inference  result  by 
comparing  the  image  I\  and  image  /3. 

It  is  noticed  that  there  exist  a  big  difference  between  the  resulting  certainty  fac¬ 
tors  0.029  and  0.0001  quantitatively,  which  indicate  that  the  intelligent  algorithm 
based  on  the  fuzzy  inference  can  perform  a  better  change  detection  than  the  sta¬ 
tistical  analysis  method.  Specifically,  it  can  detect  two  objects  with  the  same 
gray-scale  values. 


Although  the  analysis  is  based  on  a  simple  image  consisting  of  just  9x9  pixels,  this 
example  provides  an  insight  into  the  utility  of  this  intelligent  image  change  detection. 
It  makes  sense  that  fuzzy  set  theory  is  not  a  panacea[2],  but  it  does  offer  a  significant 
potential  for  image  change  detection. 
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Figure  6.3:  A  hierarchical  fuzzy  uncertainty  inference  for  image  I\  and  image  I2 
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Figure  6.4:  A  hierarchical  fuzzy  Uncertainty  inference  for  image  R  and  image  R 
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6.2  Evaluation  via  Real  Image  Data 

The  real  raster  data  used  for  evaluation  of  the  detection  algorithm  has  been  selected 
from  2D/3D  City  Digital  map  database  [60].  For  the  purpose  of  the  structure  change 
detection,  only  histogram  analysis  is  provided  for  the  pre-detection. 


Histogram  Analysis 

Our  special  consideration  is  also  the  gray  level  histogram  for  pre-detection,  which 
gives  us  a  graphical  representation  of  how  many  pixels  within  an  image  fall  into 
the  various  gray  level  boundaries. 

The  Figure  6.5  gives  a  picture  for  the  pre-detection,  where, 

Figure  6.5  (al)  is  Ground  Topography  Raster  Data. 

Figure  6.5  (bl)  is  Ground  Topography  and  Building  Height  Raster  Data- 

Figure  6.5  (cl)  is  Morphological  Clutter  Raster  Data  which  includes  building, 
open  area,  forest,  and  water 

And  Figure  6.5  (a2),  (b2)  and  (c2)  are  the  grey-scale  images,  respectively.  Figure 
6.5  (a3),  (b3)  and  (c3)  are  corresponding  histograms  of  the  grey-scale  images. 

Based  on  the  histogram  of  an  image,  it  can  be  seen  immediately  whether  the 
image  is  basically  dark  or  light  and  high  or  low  contrast. 

The  peaks  in  the  histogram  shown  in  Figure  6.5(c3)  are  associated  with  “building, 
forest,  water  and  open  area” .  The  histogram  analysis  shows  that  there  are  at  least 
four  different  types  of  objects  within  that  image. 

Comparing  the  histograms  of  different  images  which  cove  the  same  area  gives 
the  pre-detection  result,  that  is  changes  have  occurred  statistically.  However,  the 
judgment  of  trustiness  on  change  requires  to  perform  the  further  analysis. 

Fuzzy  Intelligent  Analysis 

Figure  6.6  shows  the  original  images  and  corresponding  feature  images  which  are 
derived  by  the  edge  extraction. 
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Figure  6;5:  Pre-detection  for  real  image  data 

After  the  extraction  of  image  features  is  performed,  the  following  statistical  data 
for  fuzzy  intelligent  detection  can  be  calculated. 

1.  In  case  shown  in  Figure  6.6(a),  the  gradient  average  Gavg  =  7.9,  standard 
deviation  a  =  19.9,  and  median  value  Gmera  =  0. 

2.  In  case  shown  in  Figure  6.6(b),  the  gradient  average  Gavg  =  15.4,  standard 
deviation  a  —  31.7,  and  median  value  Gme*  =  0. 
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Figure  6.6:  Real  image  feature  extraction  for  post-detection 

3.  And  for  the  Figure  6.6(c),  the  gradient  average  Gavg  —  18.7,  standard  devi¬ 
ation  cr  =  39.6,  and  median  value  Gmedi  =  0. 

Based  on  these  parameters,  the  intelligent  change  detection  can  be  analyzed.  Let 
the  image  shown  in  Figure  6.6(b)  be  a  reference  image.  Then  the  membership 
functions  with  respect  to  the  gradient  average  and  standard  deviation  are  given 
in  Figure  6.7  and  Figure  6.8,  respectively.  By  comparing  the  images  in  Figure  6.6 
and  (b),  the  fuzzy  inference  is  represented  as  in  the  Figure  6.9. 
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Figure  6.7:  Membership  functions  with  gradient  average 


Figure  6.8:  Membership  functions  with  standard  deviation 


6.3  Real  Implementation  Consideration 

Successful  image  change  detection  based  on  satellite  data  requires  many  facts.  Ide¬ 
ally,  the  image  data  used  to  perform  change  detection  should  be  acquired  by  a  remote 
sensor  system  that  holds  the  basic  resolution  constants  such  as  temporal,  spatial,  spec¬ 
tral  and  radiometric. 
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Temporal  Resolution:  the  data  used  to  perform  image  change  detection  should  be 
acquired  at  approximately  the  same  time  of  day.  This  eliminates  diurnal  sun  angle 
effects.  Whenever  possible,  it  is  desirable  to  use  data  acquired  on  anniversary 
dates.  In  this  way,  seasonal  sun  angel  and  plant  phonological  difference  that  can 
destroy  the  change  detection  can  be  removes. 

Spatial  Resolution:  Although  it  is  possible  to  perform  change  detection  using  data 
collected  from  two  different  sensor  systems  with  different  instantaneous  field  of 
views  (IFOV),  ideally,  the  data  are  acquired  by  a  sensor  system  that  collects  data 
with  the  same  IFOV.  . 

Also  environmental  characteristics  are  important  when  performing  change  detection. 
For  example,  atmospheric  and  soil  moisture  conditions,  and  phonological  cycle  will 
affect  the  change  detection.  It  is  desirable  to  hold  environmental  variables  as  constant 
as  possible. 

Although  the  intelligent  change  detection  has  been  not  completely  implemented  in 
a  real  system,  the  theoretical  analysis  and  real  raster  data  evaluation  show  that  it  can 
achieve  reasonable  successes  to  evaluate  the  change  observed  in  the  scene.  However, 
it  is  still  necessary  to  do  further  investigation  by  applying  it  to  a  real-world  problem. 
This  is  because  this  evaluation  is  only  loosely  associated  with  real  system,  and  also  real 
implementation  requires  additional  transformation  of  data  between  different  formats 
such  as  vector  to  raster. 


6.4  Summary  of  the  Hybrid  Approach 

The  hybrid  method  for  image  change  detection  can  be  summarized  as  shown  in 
Figure  6.10. 
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Figure  6.10:  Summary  of  the  hybrid  image  change  detection 


Chapter  7 
CONCLUSION 


In  a  distributed  environment,  compatibilities  and  consistencies  are  major  concerned 
issues.  The  technical  approach  to  handling  these  issues  is  referred  as  to  “conflation”. 
Right  now,  the  conflation  is  becoming  such  a  broader  consideration  in  GIS  that  most 
GIS-oriented  organizations  need  conflation  technology,  even  if  many  of  them,  do  not 
yet  fully  understand  its  role.  The  payoff  of  successful  conflation  in  better  meeting  GIS 
data  needs  is  enormous.  Therefore,  to  investigate  or  to  solve  conflation  problems  in  the 
distributed  intelligent  mobile  agent  system  is  of  great  significance. 

Since  change  over  time  in  geographic  area  is  particularly  important  in  GIS  appli¬ 
cation,  in  the  recent  years  there  has  been  a  significant  amount  of  research  put  forth 
in  the  development  of  change  detection  methods.  However,  a  more  direct  approach  is 
to  seek  information  resources  that  record  the  changes  directly.  The  image  data  from 
the  satellite  have  been  proved  to  be  extremely  useful  for  change  detection  techniques. 
It  is  not  difficult  to  see  that,  indeed,  image  change  detection  has  a  great  potential  in 
augmenting  the  regular  conflation. 

In  the  conclusion,  the  main  contributions  or  results  are  emphasized.  The  open 
research  issues  are  discussed,  and  the  future  work  is  provided. 


7,1  Main  Contributions 

With  respect  to  conflation  and  image  change  detection  issues  in  GIS,  the  main  contri- 
butions  of  the  dissertation  include: 


•  A  flexible  conflation  model 

In  order  to  remedy  a  defect  of  the  regular  conflation  algorithms,  the  major  gaps  or 
shortfalls  in  the  conflation,  are  identified,  and  then  a  raster-based  vector  conflation 
model  is  developed.  The  main  idea  is  to  use  image  change  detection  to  enhance 
the  conflation  capabilities.  It  is  apparent  that  the  model  differs  significantly  from 
the  previous  attempts,  inasmuch  as  the  detecting  process  is  using  the  real  time 
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satellite  image  data.  In  this  way,  a  vector-based  GIS  system  may  augment  the 
conflation  with  image  change  detection  in  the  raster  domain  by  performing  a 
vector- to-raster  conversion.  Also,  the  model  is  more  general  since  it  can  work  in 
two  different  ways,  that  is,  pre-detection/post-conflation  and  pre-conflation/post- 
detection. 

•  A  conflation  scheme  based  on  vector  data 

Based  on  the  conflation  model  developed  for  distributed  mobile  agent  systems,  the 
dissertation  is  specially  focused  on  processing  spatial  and  non-spatial  conflation 
problems  in  vector  GIS.  , By  emphasizing  that  conflation  is  an  inexact  process,  a 
component-ware  conflation  scheme  is  investigated,  in  which  spatial  conflation  is 
performed  by  mainly  using  the  mathematical  tool  such  as  distance  measures,  and 
non-spatial  conflation  is  performed  by  employing  the  artificial  intelligent  tech¬ 
niques.  It  is  possible  to  assemble  these  components  to  process  any  specific  prob¬ 
lems. 

•  A  hybrid  change  detection  approach 

Since  image  interpretation  frequently  involves  human  knowledge  which  inherits 
the  incompleteness  and  imprecision,  the  problems  associated  with  the  areas  of 
fuzziness  and  uncertainty  are  of  great  concern  in  image  change  detection.  An  im¬ 
portant  contribution  of  the  work  is  to  develop  a  hybrid  approach  for  image  change 
detection  by  taking  the  advantages  of  statistical  analysis  methods  and  fuzzy  in¬ 
ference.  That  is  a  two-step  method,  in  which  the  intelligent  change  detection  can 
perform  a  fuzzy  inference  under  uncertainty. 

As  the  complexity  and  intelligence  of  GIS  application  are  progressing,  the  utility 
of  intelligent  techniques  such  as  fuzzy  sets  will  increase.  Incorporating  fuzzy  sets 
in  GIS  is  becoming  cost-effective  approach  to  address  complex  problems.  The 
novelty  of  the  approach  presented  here  is  to  employ  more  potential  techniques 
-  the  fuzzy  uncertainty  inference.  A  hierarchical  inferencing  structure  presented 
indicates  a  simple  way  to  perform  a  fuzzy  inference.  The  approach  is  examined 
by  its  ability  to  support  a  simple  image  change  detection.  Furthermore,  the  test 
on  real  images  have  shown  that  it  is  capable  of  detecting  meaningful  changes. 
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7.2  Open  Research  Issues 

The  conflation  is  a  practical  and  complex  process.  Based  on  the  model  presented, 
the  focus  of  the  dissertation  is  put  on  the  spatial  and  non-spatial  conflation  in  vector' 
data,  and  specially  on  the  image  change  detection  for  augmenting  the  conflation.  The 
real  implementation  will  also  require  many  other  steps  such  as  transformation  between 
different  format,  vectorization  of  the  raster  data,  and  development  of  knowledge  bases, 
etc.  All  of  these  are  open  research  areas.  They  require  sufficient  investigations. 


7.3  Future  Work 

Even  though  the  dissertation  developed  a  flexible  conflation  model,  the  implementation 
remains  a  challenging  area  for  further  investigation  since  the  real  world  is  very  com¬ 
plicated.  The  following  discussions  will  provide  some  research  ideas  for  improving  the 
change  detection  in  the  future. 


•  Modification  of  the  membership  functions 

The  specification  and  tuning  of  membership  functions  has  been  a  source  of  much 
criticism  leveled  at  the  fuzzy  ,  logic  approach  [41].  Often  the  method  of  specify¬ 
ing  the  membership  function  is  summarily  described  as  being  chosen  by  experts 
[1].  There  are  amount  of  researches  and  applications  that  involves  specifying  the 
membership  functions  in  GIS,  such  as,  using  fuzzy  membership  functions  from 
experiment  with  GISs  user  for  decision  support  [33] ,  extracting  fuzzy  membership 
functions  from  the  volumes  of  historical  data  for  a  contextual  fuzzy  cognitive  map 
framework  [52]. 

It  is  important  to  note  that  the  membership  functions  in  the  intelligent  image 
change  detection  are  defined  by  interpreting  the  images,  which  also  depend  on 
the  experience.  The  further  modification  of  membership  functions  should  be  able 
to  improve  the  fuzzy  intelligent  detection. 

•  Determination  of  the  weights 

For  performing  the  reasoning  under  the  uncertainty  in  the  fuzzy  intelligent  detec¬ 
tion,  the  weights  at  level  two  have  to  be  pre-defined.  However,  the  determination 
of  the  weights  in  the  hierarchical  inferencing  structure  is  still  free.  How  to  design  a 
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knowledge  base  for  weights  is  a  potential  way  to  improve  the  innovative  algorithm 
presented  here. 
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Preface 

Uncertainty  in  Geographic  Information  Systems  and  Spatial  Data 


This  special  issue  of  Fuzzy  Sets  and  Systems 
presents  a  collection  of  papers  on  the  topic  of  un¬ 
certainty  in  geographic  information  systems  (GIS) 
and  spatial  data.  The  complexity  of  spatial  data,  its 
interpretation  and  use  in  GIS  entail  many  aspects  of 
uncertainty.  Our  intention  for  this  issue  is  to  more 
broadly  expose  researchers  in  fuzzy  logic  and  soft 
computing  to  these  topics.  They  represent  a  rich  envi¬ 
ronment  for  further  modeling  and  development  by  the 
fuzzy  set  community.  We  believe  a  valuable  cross- 
fertilization  with  considerable  uncertainty  research 
efforts  in  the  GIS  community  will  be  facilitated  by 
exposure  to  papers  in  this  special  issue. 

The  significance  of  this  topic  can  be  assessed  by 
the  announcement  from  the  University  Consortium  for 
Geographic  Information  Science  (in  US)  in  1997  that 
uncertainty  is  one  of  the  ten  research  priorities  in  GIS. 
These  research  priorities  have  had  considerable  influ¬ 
ence  already  in  funding  directions.  It  should  also  be 
noted  that  included  among  the  contributors  to  this  col¬ 
lection  are  well-known  researchers  in  the  area  of  GIS. 
Mike  Goodchild  who  has  written  a  special  introduction 
to  this  collection  is  a  leader  in  the  field  and  is  interna¬ 
tionally  known  for  his  work  on  issues  of  accuracy  and 
imprecision  in  GIS.  Peter  Fisher  whose  article  leads 
off  this  collection  is  the  editor-in-chief  of  arguably 
the  most  important  journal  in  the  GIS  area,  the  Inter¬ 
national  Journal  of  Geographic  Information  Science. 
Another  author,  Peter  Burrough  has  recently  edited  an 
influential  volume  surveying  imprecision  concerns  for 
geographic  objects. 

This  special  issue  has  evolved  from  two  special 
sessions  the  editors  organized  at  FUZZ-IEEE’98  in 


Anchorage,  Alaska  and  IPMU'9S  at  La  Sorbonne  in 
Paris.  Several  papers  from  these  sessions  were  ex¬ 
tended  for  this  collection  and  arc  included  as  well  as 
others  submitted  in  response  to  an  open  call.  The  pa¬ 
pers  cover  a  spectrum  of  work  involving  imprecision 
and  uncertainty  relative  to  geographic  and  spatial  con¬ 
cerns.  The  first  two  papers  in  the  collection  provide  a 
general  view  of  broad  issues  and  representations  for 
spatial  data.  The  next  four  papers  illustrate  various 
usages  of  fuzzy  sets  by  GIS  researchers  in  the  appli¬ 
cation  areas.  Then,  the  following  two  papers  examine 
some  approaches  in  spatial  reasoning  based  on  fuzzy 
set  representation.  The  last  rwo  papers  in  this  volume 
describe  aspects  of  natural  language  concerns  for  un¬ 
certainty  in  GIS. 

W'e  wish  to  thank  all  researchers  who  have  shown 
interest  in  this  effort  and  submitted  papers  and  we  also 
acknowledge  the  valuable  efforts  of  the  reviewers  who 
had  to  produce  their  reviews  on  a  very  tight  schedule. 
Finally,  we  gratefully  acknowledge  the  founding  ed¬ 
itor  of  Fuzzy  Sets  and  Systems,  Hans  Zimmermann, 
who  was  very  supportive  and  helpful  in  his  handling 
of  this  issue. 
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leads  to  the  development  of  a  new  interpretation  of 
the  database  projection  operator. 

The  second  set  of  papers  provides  techniques  for  pro¬ 
cessing  ar.d  modeling  spatial  data.  The  first  paper 
by  Havrar.  provides  a  new  approach  for  memory  map¬ 
ping  of  binary  search  trees  that  can  improve  spatial 
locality  o:  data  and  thus  spatial  query  performance. 
In  the  nex:  paper  by  Chung  and  Wu,  some  improved 
spatial  data  structure  representations  including  linear 
quadtrees  arc  presented.  These  representations  are 
shown  to  have  better  compression  performance.  Fi¬ 
nally,  the  paper  by  Forlizzi  and  Nardelli  describes  the 
lattice  completion  of  a  poset  to  model  spatial  relation¬ 
ships.  The'*  prove  some  of  the  needed  conditions  for 
valid  intersection  and  union  relations  among  spatial 
objects  with  this  representation. 


This  special  issue  of  Informatica  focuses  on  several 
research  topics  in  the  area  of  spatial  data  manage¬ 
ment.  Spatial  databases  have  developed  as  extensions 
to  ordinary  databases  in  response  to  rapidly  develop¬ 
ing  applications  such  as  Geographic  Information  Sys¬ 
tems  (GIS),  CAD  systems,  and  many  multimedia  ap¬ 
plications.  These  applications  dictate  the  need  for  (1) 
additional  data  types,  including  point,  line  and  poly¬ 
gon;  (2)  spatial  operations  such  as  intersection,  dis¬ 
tance,  etc.;  and  (3)  the  ability  to  handle  some  combi¬ 
nation  of  objects  (vector  data)  and  fields  (raster  data). 
The  nature  of  spatial  data  requires  multi-dimensional 
indexing  to  enhance  performance,  and  much  research 
has  been  devoted  to  this  topic. 

The  first  set  of  papers  in  this  issue  provides  descrip¬ 
tions  of  extensions  to  the  standard  functionality  in  GIS 
databases.  In  particular,  these  first  three  papers  dis¬ 
cuss  extensions  for  space-time  visualization,  sound  as 
a  spatio-temporal  field  and  the  inclusion  of  network 
facilities  in  a  GIS.  A  realization  of  visual  aspects  of 
the  space-time  conceptual  framework  of  Hagerstrand 
is  discussed  in  the  first  paper  by  Hedley,  Drew,  Arfin 
and  Lee.  They  demonstrate  one  of  the  first  examples 
of  an  implementation  of  this  approach  and  provide  a 
real-world  case  study  of  visualizing  worker  exposure  to 
hazardous  materials.  In  the  second  paper  (Laurini,  Li, 
Servigne,  Kang)  a  field-oriented  approach  to  auditory 
data  in  a  GIS  is  described.  The  special  semantics  of  au¬ 
ditory-  information  are  presented  and  some  techniques 
for  indexing  of  such  data  are  indicated.  The  third 
paper  in  this  set  adds  additional  levels  of  abstraction 
to  extend  the  semantics  of  GIS  networks.  The  au¬ 
thors,  Claramunt  and  Maingucnaud,  then  show  this 
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Abstract 


“ny  'pat:al  data  m0LjL':m=  strategies  rely  upon  approximate  representations  of  spatial  objects  both  for  computational 
L.nctencv  issues  as  well  as  the  simplification  of  logical  modeling  strategies.  The  most  widely  used  approximation  is  the 
minimum  bounding  rectangle  (MBR).  While  the  use  of  MBRs  in  spatial  data  modeling  is  extensive  due  to  their  efficiency 
for  storage  and  rc.ationsmp  calculation,  their  use  as  a  solitary  means  of  idemifyins.  for  examoie.  toooioeical  relationships 
uLtwcen  objects  is  problematic  due  to  the  inconsistency  of  mappings  between  relationships  of  MBRs  and  corresponding 
rc.ationships  of  the  objects  they  represent.-  In  this  paper  we  examine  several  extensions  to  the  MBR  model  that  reduce  the 
discrepancies  between  binary  spatial  relationships  of  the  MBRs  and  those  of  the  contained  ob.ects.  For  each  scheme,  we 
wonsider  the  implications  to  the  determination  of  fuzzy  spatial  relationships  and  the  impact  on  computational  issues,  (c:  2000 
Puolishcd  by  Eisevicr  Science  B.V.  All  rights  reserved. 


Ae.ruWr  Minimum  bounding  rectangle:  Fuzzy  spatial  relationship:  Conflation:  Multiple  rectangle  representation 


y 

v 

1-  Introduction 

The  process  of  transforming  geometric  information 
corresponding  to  real-world  geographic  objects  into  a 
digital  model  has  almost  unlimited  potential  for  the 
introduction  of  uncertainty.  Particular  stages  of  the 
process  at  which  a  large  part  of  the  resulting  uncer¬ 
tainty  is  related  include:  data  capture  (due  to  random 
errors  associated  with  equipment),  discretization  (due 
to  sampling  resolutions,  rounding  errors,  etc.)  and 
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object  identification  (due  to  human-dependent  bound¬ 
ary  determination  and  proper  labeling).  It  is  possible, 
in  spite  of  these  obstacles,  to  finally  arrive  at  a  digi¬ 
tal  representation,  which,  though  not  perfect,  is  “good 
enough".  We  mean  that  some  system,  such  as  a  geo¬ 
graphic  information  system  (GIS),  is  able  to  utilize 
the  available  data:to  provide  analysis  and  query  results 
that  are  perfectly  suitable  in  terms  of  completeness 
and  accuracy  for  the  end-user  or  application. 

These  issues  aside,  assuming  that  we  have  some 
discrete  representation  of  a  geographic  domain  which 
is  the  best  possible,  yet  another  issue  of  uncertainty 
remains  -  that  of  determining  relationships  among  the 
various  geographic  objects. 
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Fig.  1.  Example  of  the  effect  of  size  on  directional  relationship  determination. 


The  abiliry  to  discriminate  between  similar  spatial 
relationships  and  the  abiliry  to  communicate  subtle 
differences  in  such  relationships  is  a  difficult  task  for 
automated  systems.  Especially  difficult  is  the  deter¬ 
mination  of  directions  between  2-D  features.  The  use 
of  fuzzy  methods  associated  with  linguistic  variables 
[16]  is  the  most  promising  approach  so  far  for  du¬ 
plicating  the  human  reasoning  process  in  the  area  of 
spatial  relationship  determination. 

As  an  example,  consider  the  three  scenes  picrurcd 
in  Fig.  1.  In  Fig.  1(a)  it  is  unclear  whether  the  state¬ 
ment  "A  is  west  of  B"  or  “ A  is  southwest  of  B "  bet¬ 
ter  describes  the  directional  relationship  between  the 
two.  In  Fig.  1(b),  however,  it  is  much  less  controver¬ 
sial  to  state  simpiv  that  "A  is  west  of  B".  Similarly,  in 
Fig.  1(c),  most  would  acree  that  now  "A  is  southwest 
of  B". 

Often,  centroids  are  used  to  provide  a  single  point  of 
reference  for  determining  orientation  of  2-D  objecis, 
(e.g.  [3]).  However,  some  information  loss  is  inherent 
in  this  method.  For  example,  in  Fig.  1.  south  would  be 
the  only  directional  relationship  preserved  using  cen¬ 
troids.  The  model  presented  in  this  paper  preserves  ail 
of  the  directional  relationships  that  exist  between  two 
2-D  objects,  along  with  a  representation  indicating  the 
degree  to  which  the  objects  are  determined  to  partici¬ 
pate  in  each  of  the  relationships.  This  method  of  qual¬ 
ifying  directional  relationships  more  closely  models 
the  w*ay  in  which  humans  naturally  process  and  com¬ 
municate  such  information. 

The  following  section  provides  information  on 
the  use  of  minimum  bounding  rectangles  (MBRs) 
as  approximations  of  geometric  objects,  includ¬ 
ing  a  rationale  for  their  use,  associated  problems 
and  related  work  on  the  support  for  and  deriva¬ 
tion  of  spatial  relationships  based  on  their  use.  This 
is  followed  by  an  overview  of  the  abstract  spa¬ 
tial  graph  model  in  Section  3.  Section  4  follows 
with  descriptions  of  several  alternative  MBR  ap¬ 
proaches.  These  are  analyzed  in  Section  5  with 


respect  to  geometric  modeling  capabilities.  Applica¬ 
tion  of  the  various  approaches  to  the  issue  of  con¬ 
flation  is  presented  in  Section  6.  Our  conclusions 
regarding  the  extended  approaches  are  presented  in 
Section  7. 


2.  Background 

The  work  presented  here,  as  well  as  that  of  Nabil 
[11]  and  CIcmentini  [4],  relies  upon  the  use  of  MBRs 
as  approximations  of  the  geometry  of  spatial  objects. 
An  MBR  is  defined  as  the  smallest  .r-y  parallel  rect¬ 
angle  that  completely  encloses  an  object.  The  use 
of  MBRs  in  geographic  databases  is  widely  prac¬ 
ticed  as  an  efficient  way  of  locating  and  accessing 
objects  in  space  [4],  In  addition,  numerous  spatial 
data  structures  and  indexing  techniques  have  been 
developed  that  exploit  the  computationally  efficient 
representation  of  spatial  objects  through  the  use  of 
MBRs  [10.13,15].  Another  advantage  to  the  use  of 
an  MBR  representation  is  that  all  objects  can  be  dealt 
with  at  the  same  level  of  dimensionality  -  that  is, 
point,  line  and  area  features  are  all  represented  as  2- 
D  objects  across  which  operations  can  be  uniformly 
applied. 

Clementini  [4]  shows  how  MBR  relations  can  be 
used  as  a  fast  filter  to  determine  whether  it  is  possible 
for  the  object  to  satisfy  a  given  topological  relation¬ 
ship.  This  approach  is  based  on  the  identification  of  a 
set  of  MBR  relations  for  which  a  consistent  mapping 
between  these  relations  and  object  topological  rela¬ 
tions  exists.  For  example,  if  two  MBRs  are  disjoint , 
then  the  relationship  between  the  objects  must  also 
be  disjoint.  An  approach  for  ameliorating  the  topo¬ 
logical  consistency  problem  is  the  use  of  true  MBRs 
proposed  by  Nabil  [1 1].  A  true  MBR  is  not  restricted 
by  the  x-y  parallelism  requirement,  but  is  designed  to 
represenr  the  true  maximum  extent  of  an  object  un¬ 
constrained  by  orientation.  This  approach  potentially 
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results  in  less  false  area,  thereby  reducing  the  mar¬ 
gin  for  error  berween  MBR  and  object  relationship 
mappings. 

Our  goals  in  this  research  are  to  ( I )  determine  a 
representation  for  spatial  objects  that  will  alleviate,  or 
at  least  ameliorate,  some  of  the  problems  associated 
with  MBRs,  while  retaining  as  many  of  the  desirable 
characteristics  as  possible,  and  (2)  show  how  the  re¬ 
sulting  representation  can  be  used  for  modeling  fuzzy 
spatial  relationships. 


3.  Abstract  spatial  graph  model 

This  section  presents  a  data  structure  for  represent¬ 
ing  topological  and  directional  relationships,  in  addi¬ 
tion  to  supplementary  information  needed  for  fuzzy 
query  processing.  The  data  structure,  known  as  an 
abstract  spatial  arapn  (ASG),  represents  a  transfor¬ 
mation  of  2-D  space  t areas)  into  0-D  space  (points). 
A  complete  set  or  ASGs  tor  the  original  relationships, 
including  a  graphical  representation  and  specific  prop¬ 
erty  sets,  was  developed  in  [5]. 

First-level  topological  relationship  definitions  for 
these  original  relationships  are  based  on  an  extension 
of  Alien's  temporal  relations  [I  ]  to  the  spatial  domain. 
In  this  work,  Allen  showed  that  the  seven  relation¬ 
ships  before,  meets,  overlaps .  starts ,  during,  finishes 
and  equal ,  along  with  their  inverses,  hold  as  the  com¬ 
pete  set  of  relationships  between  two  intervals.  Cobb 
[o]  extended  these  to  two  dimensions  by  defining  a 
spatial  relationship  as  a  tuple  [rr.rr]  where  rx  is  the 
one  of  Allen's  relationships  that  represents  the  spatial 
relationship  between  two  objects  in  the  x  direction, 
and  rv  is  likewise  denned  for  the  y  direction.  Rela¬ 
tionships  are  often  represented  by  their  initial  letter; 
for  example,  [bo]  stands  for  the  relationship  [before, 
overlaps]  and  [bo]'1  represents  the  corresponding 
inverse  relationship.  Inverse  relationships  imply  the 
reversal  of  the  objects'  roles  in  the  relationship.  For 
example,  A  [bo]  B  is  equivalent  to  B  [bo]“!  A.  Objects 
involved  in  any  of  these  relationships  are  assumed  to 
be  enclosed  by  MBRs. 

Formal  definitions  for  each  of  the  relationships 
are  given  in  terms  of  a  set  of  constraints  based  on 
comer  positions,  each  of  which  must  hold  berween 
the  MBRs  for  each  object.  For  example,  for  the 
case  of  A  [finishes,  starts]  B,  the  definition  is  given 


as:  {i?vi  <-!ri  <Bxz,Ax2  —  Bx2,By {  <Ay2  <Bv2,Avl  = 
5w},  where  (xl.yl)  and  („r2,y2)  represent  the  lower 
left  and  upper  right  comers,  respectively,  of  the 
MBRs.  This  is  somewhat  similar  to  the  way  in  which 
the  eight  spatial  relational  operators  are  defined  in  [9] 
by  sets  of  relationships  involving  the  operators  min, 
max  and  length  of  objects,  which  are  then  combined 
with  object  representations  to  form  a  2-D  string  [2] 
for  spatial  reasoning. 

The  concept  of  extending  Allen's  temporal  rela¬ 
tions  to  two  or  more  dimensions  for  spatial  reasoning 
is  not  new.  Examples  or  how  tins  has  been  done  can  be 
found  in  [8.1 1]  to  name  a  few.  For  each,  the  approach 
taken  is  somewhat  different,  based  on  the  intent  of  the 
work.  However,  the  concept  of  representing  a  2-D  ob¬ 
ject  as  a  sc:  of  two  intervals,  an  .r  and  a  i\  and  of  hav¬ 
ing  the  resulting  spatial  relationship  consist  of  some 
combination  of  the  component  1-D  relations  remains 
key.  In  contrast.  Egcnhofcr's  well-known  model  for 
topological  relations  [7]  utilizes  a  point-set  approach 
in  which  relationships  arc  based  on  combinations 
of  intersections  berween  boundaries  and  interiors  of 
objects. 

Directional  relationship  definitions  in  [5]  rely  upon 
the  partitioning  of  MBRs  into  object  sub-groups, 
which  are  created  by  extending  the  boundaries  of  the 
two  MBRs  so  that  they  intersect  one  another.  For 
those  cases  in  which  extensions  do  not  intersect  the 
other  MBR.  each  MBR  is  considered  to  be  an  object 
sub-group.  Overlapping  areas  are  also  considered 
object  sub-groups.  The  extensions  of  the  MBR  sides 
that  form  object  subgroups  are  shown  as  dotted  lines 
in  Fig.  2. 

The  construction  of  an  ASG  for  a  binary  spatial  re¬ 
lationship  relies  heavily  upon  these  object  sub-groups. 
Each  object  sub-group  is  represented  as  a  node  on 
the  ASG.  Pictoriallv.  ASGs  arc  represented  in  a  polar 
graph  notation,  where  different  node  representations, 
e.g.,  filled  vs.  open  for  Fig.  2.  are  used  to  distinguish 
berween  the  objects  involved  in  the  relationship.  The 
origin  node  represents  the  reference  area  of  the  rela¬ 
tionship,  w#hich  could  be  a  sub-group  of  one  of  the  ob¬ 
jects,  an  overlapping  area,  or  common  boundary.  An 
example  of  an  ASG  and  its  corresponding  relationship 
is  shown  in  Fig.  2. 

To  provide  support  for  fuzzy  query  processing, 
each  node  in  an  ASG  has  associated  weights.  These 
weights  are  used  to  define  fuzzy  qualifiers  for  the 
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query  language.  Specifically,  the  weights  arc  in¬ 
tended  to  support  queries  of  the  nature  “To  what 
degree  is  region  A  south  of  region  Bl",  or  “How- 

much  ot  region  A  overlaps  region  B  (qualitatively 
speaking  )7". 

Two  types  of  weights  arc  computed:  area  weights 
and  total  node  weights,  which  are  represented  in  Fie.  2 
as  A  and  ll\  respectively.  These  provide  information 
concerning  the  degree  of  participation  in  a  relation¬ 
ship  and  relative  direction,  respectively.  Area  weights 
arc  computed  as  the  ratio  of  the  area  of  an  object  sub¬ 
group  to  the  area  of  the  entire  MBR.  The  total  node 
weight  tor  an  ASG  node  is  defined  as  the  product  of 
the  corresponding  area  weight  and  the  normalized  axis 
length  of  the  directional  axis  which  crosses  the  object 
sub-group.  Normalization  of  axis  lengths  is  accom¬ 
plished  for  each  object  by  first  assigning  a  leneth  of 
1  to  the  longest  axis  crossing  any  object  sub-group  of 
the  object.  All  other  representational  axes  of  the  ob¬ 
ject  are  then  given  a  value  between  0  and  1.  based 
on  their  lengths  relative  to  the  loneest  axis  for  that 
object. 

From  the  definition,  one  can  see  that  area  weights 
are  useful  for  answering  how  much  of  an  object  is  in¬ 
volved  in  a  relationship.  Bv  assigning  ranges  of  area 
weights  to  linguistic  terms  we  can  provided  basis  for 
processing  queries  concerning  qualitatively  defined  re¬ 
lationships.  The  set  given  below  is  one  example  of 
how  this  may  be  done. 

{all  (96-100% ),  most  (60-95%),  some  (30-59%) 
little  (6-29%),  none(0-5%)}. 


Tms  provides  the  capability'  to  pose  queries  such  as 
the  following: 

•  Is  object  A  surrounded-bv  most  of  object  Z?? 

•  Retrieve  an  object  that  is  panially-surroundcd-bv 
little  of  object  A. 

Node  weights  arc  utilized  in  a  similar  manner  to  pro¬ 
vide  qualitative  directional  relationship  information. 
The  purpose  of  node  weights  is  to  answer  the  extent  to 
which  an  object  can  be  considered  to  be  at  a  aiven  di¬ 
rection  in  relation  to  another  object.  The  definition  of 
a  node  weight  as  the  product  of  the  area  weieht  and  the 
axis  length  means  that  this  information  is  represented 
as  a  consideration  of  both  the  total  relative  amount  of 
the  object  which  lies  in  a  given  direction  (represented 
by  the  area  weight)  tempered  by  how  directly  it  lies 
in  that  direction  (represented  by  the  axis  lentrth).  This 
implies  that  those  directions  which  have  both  a  larae 
area  representation  and  long  axis  length  will  have  a 
higher  weight  than  those  which  have'  either  a  large 
area  representation  but  short  axis  length,  or  those  that 
have  a  long  axis  length  with  lesser  area  representation 
Again,  ranges  are  provided  that  define  a  linguistic  set 
useful  for  query  purposes.  These  are  given  below. 

(directly  (96-100%),  mostly  (60-95%),  slightly 
(30-59%),  somewhat  (6-29%),  not  (0-5%)}.* 

The  use  of  these  qualifiers  is  illustrated  in  the  fol- 
lowing: 

•  Is  object  B  somewhat  west  of  object  A? 

•  Retrieve  an  object  directly  northeast  of  object  A. 

The  preservation  of  all  directional  information  re¬ 
garding  two  objects,  along  with  the  use  of  total  node 
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weishts  as  described  here,  allows  users  to  obtain  a 
complete  conceptualization  of  directional  relation¬ 
ships  between  the  objects.  Furthermore,  the  calcula¬ 
tion  of  the  total  node  weight  as  the  product  of  the  axis 
weiaht  and  area  weight  ensures  against  bias  in  those 
cases,  for  example,  where  the  object  sub-group  asso¬ 
ciated  with  a  directional  axis  is  very  large  (increasing 
the  weight  for  that  direction),  but  for  which  the  axis 
weight  is  very  small  (indicating  a  weaker  association 
for  that  sub-group,  direction  pair  than  for  others  for 
the  same  object). 


Fie.  3.  Uniform  MRRs  used  for  ..bjec:  representations. 


4.  Extensions  to  the  standard  MBR  representation 

The  ASG  model  utilizes  MBRs  to  represent  fea¬ 
tures  for  the  advantages  given  in  the  preceding  sec¬ 
tion.  while  algorithms  for  computing  relationships  for 
the  mode!  arc  designed  in  such  a  way  as  to  minimize 
the  potentially  negative  impact  of  the  use  of  MBRs. 
Additionally,  the  ASG  mode!  extends  the  use  of  rect¬ 
angular  boundaries  as  representations  of  sub-objects 
within  the  MBRs. 

Of  course,  the  use  of  MBRs  is  inherently  problem¬ 
atic  to  some  degree  because  an  MBR  is  an  approxi¬ 
mation  of  an  object's  true  geometry.  One  of  the  more 
significant  challenges  is  the  modeling  of  topological 
spatial  relations  using  MBR  representations.  The  prob¬ 
lem  is  that  the  enclosure  of  false  area  (area  not  actu¬ 
ally  contained  in  the  geometry  of  the  object)  within 
the  MBR  renders  inconsistencies  between  the  appli¬ 
cation  of  topological  relations  to  the  MBRs  vs.  the 
application  of  the  relations  to  the  objects  themselves. 
Such  an  inconsistency  is  especially  evident  in  the  case 
of  overlapping  relationships  for  which  the  MBRs  may 
overlap,  but  for  which  no  conclusion  can  be  drawn 
about  the  corresponding  relationship  of  their  respec¬ 
tive  contained  objects. 

In  keeping  with  generally  accepted  practice,  both 
MBRs  and  the  sub-recrangles  utilized  in  the  ASG 
model  retain  x-y  parallelism:  however,  in  this  sec¬ 
tion  we  explore  several  variations  on  this  traditional 
approach  and  investigate  the  respective  implica¬ 
tions  to  the  modeling  of  relationships.  Our  goal  m 
investigating  alternative  representations  for  geometric 
oroperties  of  spatial  fearurcs  is  threefold:  (1)  allevi¬ 
ate  or  significantly  decrease  anomalies  tn  topological 
relationship  determination  based  on  MBRs,  (2)  im¬ 


prove  accuracy  in  determination  of  directional  and 
topological,  relationships  between  representations, 
and  (31  maintain,  as  much  as  possible,  computational 
efficiency  in  processing  concerning  relationship  cate¬ 
gorization  and  querying. 

“  First,  we  consider  the  implications  of  partitioning 
MBRs  into  sets  of  rectangles,  essentially  result¬ 
in':  in  a  gridded  surface  which  is  an  approximation 
we  call”  multiple  rectangle  representation,  or 
MRR.  Three  variations  on  MRRs  which  we  ana¬ 
lyze  in  this  section  include  (1)  a  uniform  MRR. 
(2)  a  non-uniform,  congruent  MRR.  and  (3)  a 
non-uniform,  non-congrucnt  (irregular)  MRR.  All 
three  variations  of  MRRs  result  in  a  finer  ap¬ 
proximation  of  the  object's  true  geometry  than 
do  MBRs.  while  maintaining  a  basic  regular, 
rectangular  structure  for  which  computationally 
efficient  methodologies  have  been  developed. 

The  first  variation  can  be  viewed  as  the  im¬ 
position  of  a  "rid  of  any  level  of  resolution  upon 
an  object,  after  which  any  of  the  rectangles  not  ac¬ 
tually  intersecting  with  a  pan  of  the  object  is  dis¬ 
carded.  Two  cases  presented  here  include  one  in 
which  grids  of  the  same  resolution  are  used  for  both 
objects”  panicipating  in  a  relationship,  as  well  as 
the  case  in  which  grids  of  different  resolutions  are 
used.  Fig.  3  shows  a  simple  example  of  the  use  of 
uniform  ”MRRs.  The  dotted  line  shows  the  original 
boundaries.  Numbers  for  grid  cells  have  been  assigned 
arbitrarily,  and  are  used  in  following  discussions  on 

relationship  definitions. 

Now  we  consider  enhancements  to  the  Abu  to  ac- 
commodate  the  use  of  MRRs  for  relationship  deter¬ 
mination.  We  begin  with  the  supposition  that  the  set 
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of  A SGs  is  a  closed  set,  such  that  any  modification  to 
the  way  in  which  relationships  are  defined  does  not 
result  in  any  new  ASG.  This  issue  is  discussed  fur¬ 
ther  in  Section  5.  It  is  apparent,  however,  that  the  way 
in  which  the  ASGs  themselves  are  defined  for  MRRs 
must  be  modified  to  take  advantage  of  the  more  accu¬ 
rate  representation. 

We  do  this  by  first  computing  a  set  of  ASGs  - 
one  ASG  for  every'  combination  of  relationships 
between  sub-rectangles  of  both  objects'  MRRs.  This 
results  in  a  sc:  of  ASGs  for  each  binary  relationship, 
S  -  {.-J,, .  1  ^  ^  I  ^  ^  m.  n  —  number  of  subrcct- 
angles  in  first  object:  /?i=number  of  subrcctangies  in 
second  object}.  For  example,  in  Fig.  3  arc  two  ob¬ 
jects  with  simpie.  uniform  MRRs.  The  resulting  set 
of  relationship  ASGs  for  this  example  is 

5  =  !.•/••  =  [hoi.  .T;  =  [bo],  .4.;.  =  [bo”1]. 

Ay,  =  [bfci.  A-  =  [bbj.  =  [bo], 

-  [obi.  A\z  -  [bo],  A\\  =  [bo]}. 

An  approach  for  utilizing  this  information  set  to 
gain  a  more  accurate  picture  of  the  relationship  under 
consideration  is  to  first  associate  a  count  with  every 
distinct  relationship  appearing  in  the  set.  For  our 
exampie.  this  would  yield  S'  =  {( [bo],4).  ([bb],3  ). 
([bo~!],  1  ).([obj.  1 )}.  These  counts  divided  by  the  to¬ 
tal  number  of  sub-rec:anlge  relationship  combinations 
(9  in  the  exampie)  are  considered  as  membership 
values  in  fuzzy  relationships.  Rather  than  having  a 
single,  crisp  binary  relationship  as  was  the  case  in 
the  original  ASG  model,  we  now  have  a  set  of  ASG 
relationships,  each  member  of  which  a  given  binary* 
relationship  belongs  to  some  degree.  For  example, 
the  fact  that  [bo]  and  [bb]  appear  as  the  predomi¬ 
nant  relationships  for  Fig.  3.  along  with  the  fact  that 
[bo~‘]  and  [ob]  exist,  although  to  a  much  lesser  de¬ 
gree.  conveys  a  substantially  more  significant  amount 
of  information  than  does  the  original  [ooj  desig¬ 
nation.  which  in  this  case  is  also  inaccurate  with 
respect  to  the  contained  objects'  relationship  due 
to  the  topological  inconsistency  problem  associated 
with  MBRs. 

We  then  take  this  a  step  further  by  associating  these 
membership  values  with  the  higher-level  relationships 
defined  in  [5].  Because  these  relationships  represent 
a  mutually  exclusive,  total  partitioning  of  the  basic 


relationships,  a  mapping  from  the  members  of  S  to 
these  relationships  will  result  in  at  most  the  same 
number  of  relationships  as  the  number  of  members  of 
5.  However,  it  is  more  often  the  case  that  fewer  high- 
level  relationships  result.  This  is  because  (1)  such 
relationship  definition  sets  are  often  composed  of 
basic  relationship  neighbors,  and  (2)  the  use  of 
MRRs  in  the  manner  described  necessarily  implies 
relationship  categorizations  for  neighboring  sub¬ 
rcctangies.  resulting  in  neighboring  relationships.  In 
our  example,  each  of  fho],  [bb].  [bo"'  ],  and  [ob]  ap¬ 
pears  in  the  disjoin?  relationship,  leading  to  the  invari¬ 
able  conclusion  char  the  two  representations  arc  indeed 
disjoint. 

This  approach  eliminates  the  need  to  compute  the 
set  of  weights  for  ASG  nodes  as  was  done  in  the 
original  model,  as  similar  information  (the  counts  are 
analogous  to  the  original  weights)  is  now  maintained 
in  S'. 

The  application  of  non-uniform,  congruent  MRRs 
can  be  understood  as  an  analog  to  a  quadtree  de¬ 
composition  commonly  performed  for  spatial  indexing 
purposes.  The  approach  begins  with  standard  x-y  par¬ 
allel  MBRs,  upon  which  a  quadtree-like  decomposi¬ 
tion  is  performed  with  the  MBR  being  divided  into 
four  equivalent  rectangles,  any  or  all  of  which  can 
then  be  divided  similarly,  continuing  until  as  fine  a 
partitioning  as  desired  is  achieved.  At  that  point,  any 
rectangles  not  actually  containing  a  pan  of  the  object 
arc  discarded.  We  say  the  rectangle  sets  are  regularly 
hierarchical  because  the  level  of  detail  (size)  of  the 
rectangles  can  vary  across  different  pans  of  the  ob¬ 
ject  so  as  to  achieve  a  desired  level  of  representation, 
while  each  larger  rectangle  is  exactly  a  multiple  to 
the  fourth  of  any  of  the  smaller  rectangles  of  the  ob¬ 
ject.  An  example  of  this  type  of  MRR  is  shown  bv  the 
grayed  area  in  Fig.  4. 

This  type  of  boundary  approximation  more 
accurately  represents  the  objects*  geometry  in 
comparison  to  either  MBR  or  uniform  MRR  rep¬ 
resentations.  While  maintaining  the  greatly  reduced 
incidence  of  topological  inconsistencies  between  true 
and  approximate  boundaries  illustrated  in  the  uniform 
MRRs.  additional  levels  of  detail  have  been  added  that 
allow  for  more  accurate  relationship  determination. 
Since  the  areas  are  still  rectangular  decompositions.  . 
computational  issues  remain  simplified  compared  to 
boundary  representations. 
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The  most  significant  issue  that  must  be  addressed 
concerning  fuzzy  relationship  modeling  for  this  ap¬ 
proach  is  the  manner  in  which  the  differently  sized 
rectangles  are  assessed  as  contributors  to  overall  rela¬ 
tionship  determination.  This  can  be  handled  by  extend¬ 
ing  the  approach  used  for  uniform  MRRs.  While  for 
uniform  MRRs  it  was  sufficient  to  use  the  mere  exis¬ 
tence  of  the  basic  relationships  between  sub-rectangles 
as  equal  factors  in  fuzzy  relationship  categorizations, 
we  must  now  compute  a  weight  analogous  to  the  ASG 
node  weights  for  each  sub-rectangle  relationship  to 
achieve  a  level  of  normalization  for  “combining"  the 
individual  relationships  into  one. 

Recall  that  the  area  weights  arc  computed  as  the 
ratio  of  the  MBR  sub-object  areas  to  the  total  MBR 
area,  and  that  these  weights  are  used  to  identify  the 
extent  to  which  an  object  participates  in  a  given  re¬ 
lationship.  Using  th is  same  approach  for  non-uniform 
congruent  MRRs  -  calculating  a  weight  as  the  ratio 
of  a  sub-rcctang!e*s  area  to  the  combined  area  of  all 
sub-rectangles  containing  a  part  of  the  object  -  we 
achieve  a  level  of  normalization  for  use  of  differently 
sized  rectangles. 

For  each  applicable  relationship,  the  weights  for 
every  different  sub-rectangle  of  each  object  for  that 
relationship  are  summed,  resulting  in  a  value  ^  1. 
Whenever  one  sub-rectangle  participates  in  the  same 
relationship  with  multiple  sub-rectangles  of  the  sec¬ 
ond  object,  that  sub-rectangle’s  weight  is  only  counted 
once.  The  resulting  sums  for  a  relationship  for  the  two 
objects  are  then  multiplied,  yielding  a  weight  for  that 
relationship.  For  example,  consider  the  case  illustrated 
in  Fig.  5. 


1 

B 

n 

a:%  =  .1:az. 


=  .I:A3.»  1.  A4W  =  J5;  A5* 
Blw  =  .5;  BZ*  =  5 


.35 


Fig.  5.  Non -congruent  MRRs  with  area  weights. 


For  this  example,  wc  have  the  following  sc:  of 
relationships: 


A/B 

1  1 

2 

1 

1  rbdl 

rbb'i 

2 

1  rbdl 

[bb"l 

3 

1  rbdl 

1  fbb'l 

4 

1  rboi 

1  rbo-'i 

5 

1  rboi 

1  rbo-'i 

By  summing  and  multiplying  the  area  weights  in 
the  manner  described  eariier.  wc  obtain  the  following: 

[b<l]*  =  0.15.  [bo]vv  =  0.35. 

[bfa-']„  =0.15.  [bo-l]„  =  0.35. 

This  shows  that  the  relationships  [bo]  and  [bo-1] 
are  weighted  more  heavily,  primarily  due  to  the  larger 
areas  of  sub-rectangles  4  and  5  of  object  A.  These 
weights  can  then  be  associated  with  the  higher  level 
relationships  in  the  same  manner  as  was  shown  for  the 
uniform  MRRs. 

One  final  possibility  for  the  use  of  a  multi¬ 
ple  rectangle  representation  includes  the  applica¬ 
tion  of  irregular  rectangles  -  meaning  any  ad  hoc 
method  of  partitioning  the  object  into  rectangu¬ 
lar  sets  such  that  there  is  no  apparent  relation¬ 
ship  between  the  rectangles.  This  could  be  done, 
for  example,  by  a  human  operator  in  order  to 
achieve  some  objective  such  as  finding  the  “best" 
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rectanzuiar  coverage  of  an  object,  or  minimizing 
either  the  number  of  rectangles  used  or  the  false  area 
within  rectangles. 

While  this  may  appear  to  be  a  special  case,  careful 
consideration  will  show  that  the  same  method  given 
in  the  section  for  non-uniform  congruent  MRRs  is 
equally  applicable  for  irregular  MRRs. 


5.  Geometric  modelling  capabilities 

The  use  of  MRRs  for  the  ASG  model  has  several 
implications.  First  is  that  the  partitioning  oi  MBRs 
in  the  wavs  described  has  no  effect  on  the  basic  rela¬ 
tionship  definitions.  Any  relationship  that  originally 
held  between  two  MBRs  remains  valid  tor  the  re¬ 
sultin':  MRRs.  because  minima  and  maxima  for  the 
objects  do  not  change:  therefore,  any  partitioning  oi 
the  MBRs.  regardless  of  granularity,  will  not  affect 
the  basic  relationship  between  the  geometric  repre¬ 
sentations  of  the  objects. 

To  support  this  statement,  we  begin  by  observing 
that  the  abstract  spatial  graphs  can  be  arranged  in  a 
matrix  according  to  the  concept  of  conceptual  neigh¬ 
borhoods  with  respect  to  the  relationships  that  are 
represented.  That  is.  each  of  the  relationships  that  bor¬ 
ders  a  particular  relationship  in  both  the  horizontal  and 
vertical  directions  can  be  derived  from  the  given 
relationship  without  transitioning  through  any  other 
relationship  state.  If  a  transition  to  an  immediate 
neighboring  relationship  is  impossible,  then  it  fol¬ 
low's  that  a  transition  to  any  other  relationship  is  also 
impossible.  We  illustrate  this  by  first  examining  the 
example  show*n  in  Fig.  6.  which  represents  a  portion 
of  the  complete  relationship  matrix. 


[starts,  meet:  J 

[starts,  overlaps] 

[starts,  starts] 

!  Si — ii 

[during,  meets] 

i  i 

(during,  overlaps] 

[during,  starts] 

■ 

__  1  i  r-J 

[Finishes,  meets; 

(finishes,  overlaps  j 

[Finishes,  starts] 

Fin.  6.  (during,  overlaps]  relationship  and  conceptual  neighbors. 


[do]  (a)  (b)  (C)  (d) 


Fic.  ?.  A  set  of  2eomerric  representations  for  the  [do]  relationship. 

Now.  we  show*  that  the  enhanced  accuracy  of  the 
boundaries  for  the  MRR  approximation  model  pro¬ 
vides  additional  information  in  the  wav  of  geomet¬ 
ric  refinements  for  the  relationship  definitions.  W  hiie 
there  oriuinallv  was  only  one  geometric  model  *or 
each  relationship,  there  is  now  an  infinite  set  of  su^.i 
models  that  correlates  to  any  given  relationship.  Sim¬ 
ple  examples  for  a  standard  MRR  approach  arc  shown 
in  Fie.  7.  Internal  rectangle  boundaries  arc  omitted  lor 
clarity. 

From  this  illustration  one  can  see  that  any  selected 
MRR  partitioning  for  a  pair  ot  MBRs  will  never  cause 
a  transition  to  one  ot  the  neighboring  relationships. 
Therefore,  wc  arc  able  to  operate  under  the  assumption 
of  a  dosed  set  of  relationships  for  which  the  ASG 
mode!  and  query  framework  hold. 

F:a.  7  also  demonstrates  the  advantages  associated 
with±e  use  of  MRRs  over  MBRs:  (1)  the  ability  to 
better  represent  correct  topology,  and  (Z)  the  ability 
to  make  finer  distinctions  between  geometric  rela¬ 
tionships.  The  first  of  these  in  illustrated  in  Fig.  7 
(a),  which  shows  that  the  two  objects  could  not  pos¬ 
sibly  overlap,  while  the  original  MBR  representation 
shows  an  overlap.  The  second  advantage  can  be  seen 
Fin.  Na-b)  and  (c-d)  which  show  the  same  topo- 
louicni  relationships  —  disjoint  and  o\erlaps  —  but 
for  which  geometric  distinctions  exist  which  provide 
additional  information  for  spatial  analysis. 

6.  Application  to  conflation 

In  mis  section,  we  will  consider  the  application  of 
these  approaches  to  the  problem  of  conflation.  Con¬ 
flation  is  typically  regarded  as  the  combination  of 
information  from  two  digital  maps  to  produce  a  third 
map  that  is  better  than  either  of  its  component  sources. 
The  history  of  map  conflation  goes  -back  to  the 
early  to  mid-1980s.  The  first  clear  development  and 
application  of  an  automated  conflation  process  oc- 
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Fig.  S.  Conflation  process. 


cujrcd  during  a  joint  United  States  Geological  Survey 
t  LSuS)-Burcau  or  the  Census  project  designed  to 
consolidate  the  agencies'  respective  digital  map  files 
or  LS  metropolitan  areas  [12].  The  implementation  of 
a  computerized  system  for  this  task  provided  an  es¬ 
sential  foundation  for  much  of  the  theory  and  many  of 
the  techniques  used  today.  Since  that  time,  others’  in¬ 
cluding  commercial  G1S  vendors,  have  implemented 
conflation  tools  within  their  applications.  For  an  ex¬ 
ample  of  commercial  work  on  conflation,  see  [14]. 

Automated  conflation  of  maps  is  a  complex  process 
that  must  utilize  work  from  a  wide  range  of  subjects, 
including  pattern  recognition,  statistical  and  graph 
theory,  and  a  collection  of  heuristics  that  hardcopy  car¬ 
tographers  have  used  for  decades  to  enhance  aesthet¬ 
ics  and  increase  inroimation  content  of  paper  maps. 

Conflation  of  maps  typically  is  needed  because: 

( I )  users  wish  to  update  their  mapping  information 
withour  losing  legacy  data  which  may  not  be  included 
in  the  new  information:  (2)  one  map  source  mav 
be  more  accurate  with  respect  to,  e.g.,  positional  or 
attribute  information:  or  (3)  one  map  source  contains 
information  missing  in  another,  such  as  additional 
features,  feature  attributes  or  even  entire  coverages. 

Conflation  can.  in  brief,  be  viewed  as  a  multi-step, 
iterative  process  that  involves  feature  matching,  posi¬ 
tional  re-alignment  of  component  maps  and  attribute 
deconfiiction  of  positively  identified  feature  matches, 
as  seen  in  Fig.  8. 

Tnis  process  was  first  presented  in  [6].  As  can  be 
seen  from  the  figure,  the  process  begins  at  staue  1 
by  finding  a  candidate  set  of  matches  for  a  feature 


from  those  in  the  complementary  database  that  arc 
geographically  ciosc.  The  “closest"  feature  and  the  one 
being  maimed  arc  then  compared  according  to  their 
representation  type  (point,  line  or  area)  and  attribute 
and  value  sets. 

Stage  2  continues  with  a  more  detailed  analysis  of 
the  fearure  pair.  Selected  neighbors  and  their  relative 
positions  are  investigated.  Similarities  or  dissimilari¬ 
ties  arc  considered  as  part  of  the  evidence  for  or  acainst 
the  hypothesis  of  equivalent  features.  Considerin':  this 
first  for  entiy  nodes  (points  which  do  not  represent 
the  intersection  of  edges ),  w'c  must  investigate  prop¬ 
erties  such  as  absolute  position,  positions  (distance 
and  direction)  relative  to  nearby  features  that  have 
already  been  determined  to  be  matching  features,  as 
well  as  similarities  in  general  neighborhood  partems. 
For  connected  nodes  (nodes  that  represent  the  inter¬ 
sections  or  edges),  geometric  considerations  include 
the  directions  and  lengths  of  intersecting  edees  (in¬ 
cluding  the  spider  function).  Orientation  and  length  of 
line  features,  as  well  as  similarity  in  size  and  position 
of  bounding  boxes  can  be  used.  Similar  criteria  are 
used  for  area  features,  in  addition  to  the  equivalence 
of  values  representing  the  contained  area  of  each. 

The  final  stage  analyses  the  topology  of  the  two  fea¬ 
tures  at  the  lowest  level  of  topology  supported  by  the 
component  databases.  For  products  utilizing  winaed- 
edge  topology,  topological  connectives  of  matching 
fearure  candidates  can  be  checked  for  similarity.  This 
can  include,  ror  example,  left  and  right  edges  and  left 
and  right  (adjacent)  faces  of  edges  for  line  and  area 
features,  or  tne  containing  face  of  point  features. 
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Feature  matching,  simply  and  perhaps  somewhat 
obviously  stated,  involves  the  identification  of  fea¬ 
tures  from  different  maps  as  being  representations  of 
the  same  geographic  entity.  Positional  alignment  is  a 
mathematical  procedure  in  which  previously  . identified 
matching  features  are  brought  into  spatial  agreement, 
while  deconfiiction  is  a  process  in  which  contradic¬ 
tions  in  a  matching  pair’s  attributes  and/or  values  are 
resolved.  Positional  alignment  and  deconfiiction  are 
both  steps  that  arc  performed  after  a  positive  match 
has  been  determined.  As  such,  it  is  easy  to  see  that 
accurate  feature  matching  results  are  essential  to  the 
overall  quality  of  the  resulting  conflated  map. 

Two  aspects  of  feature  matching  in  the  conflation 
process  can  be  improved  by  the  representation  tech¬ 
niques  we  have  discussed  in  this  paper.  First,  the  use  of 
an  MR R  will  permit  more  accurate  matching  ot  geo¬ 
metric  properties  of  features  than  MBRs:  however, 
the  computational  burden  of  the  matching  will  not  be 
excessive  as  in  direct  boundary  representation  tech¬ 
niques.  Thus.  MRRs  used  in  this  manner  provide  a  fil¬ 
tering  mechanism  that  can  be  refined  as  needed  based 
on  conflation  requirements.  Secondly,  the  refinements 
of  topological  relationships,  such  as  we  have  illus¬ 
trated  in  Fig.  7,  allow  the  third  stage  of  the  feature 
matching  process  to  be  enhanced.  It  is  clear  that  there 
is  an  improvement  over  the  simple  use  of  MBRs,  as 
illustrated  in  Fig.  6.  but  still  again,  it  docs  not  require 
a  large  computational  overhead  demanded  by  formal 
topological  calculations. 

7.  Summary  and  conclusions 

In  this  paper  v/e  have  presented  several  alternative 
approaches  for  approximate  geometric  boundary  rep¬ 
resentation  based  on  both  standard  and  true  MBRs. 
These  include  variations  on  MRRs  for  uniform, 
non-uniform  but  congruent,  and  irregular  rectangular 
decompositions.  For  the  standard  MBR-based  ap- 
proaches,  we  have  shown  how  the  abstract  spatial 
graph  model  for  fuzzy  spatial  relationships  could  be 
extended  to  accommodate  the  new  representations. 

From  this  work,  we  believe  the  most  promising  ap¬ 
proaches  lie  with  the  non-uniform  but  congruent  and 
the  irregular  MRRs.  These  two  offer  the  best  trade¬ 
off  in  terms  of  increased  representational  accuracy 
and  maintenance  of  reduced  computational  complex¬ 


ity.  For  future  work,  we  intend  to  provide  a  quan¬ 
titative  analysis  of  the  various  representations  based 
on  discrimination  capabilities  and  computational  over¬ 
head.  as  well  as  determine  the  implication  for  direc¬ 
tional  relationships. 
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A  spatial  query  interlace  has  been  designed  and  implemented  in  the  object-oriented 
paradigm  for  heterogeneous  data  sets.  The  object-oriented  approach  presented  is 
shown  to  be  highly  suitable  for  querying  typical  multiple  heterogeneous  sources  of 
spatial  data.  The^spatial  query  model  takes  into  consideration  two  common  components 
of  spatial  data:  spanal  location  and  attributes.  Spatial  location  allows  users  to  specify’  an 
area  or  a  region  of  interest,  also  known  as  a  spatial  range  query’.  .Also,  the  spatial  query 
allows  users  to  query  spatial  orientation  and  relationships  (geometric  and  topological 
relationships)  among  other  spatial  data  within  the  selected  area  or  region.  Queries  on 
the  properties  and  values  of  attributes  provide  more  detailed  non-spatial  characterisucs 
of  spatial  data.  A  query  model  specific  to  spaual  data  involves  exploitation  of  both 
spatial  and  attribute  components.  This  paper  presents  a  conceptual  spatial  query  model 
of  heterogeneous  data  sets  based  on  the  object-oriented  data  model  used  in  the 
geospatial  information  distribution  system  (GIDS). 

C  2001  Academic  Press 

Keywords:  object-oriented  technology,  spatial  query,  data  integration,  quadtree,  object- 
oriented  spatial  query,  digital  mapping;  databases. 


1.  Introduction 

A  QUERY  IS  AN  INTERACTION  between  an  end-user  and  one  or  more  databases.  The  user 
formulates  a  request  based  on  what  she/he  wants  to  know.  Therefore,  without  having 
in-depth  knowledge  of  the  underlying  data,  many  users  are  able  to  ask  questions 
pertinent  to  their  work.  A  large  volume  of  data  is  available  in  many  different  formats. 
Users  expect  to  retrieve  all  relevant  informarion  from  multiple  data  sources.  Users  then 
determine  the  usefulness  of  the  data  based  on  the  quality  of  the  response  and  the 
usability’  of  the  response  to  a  query.  Query7  processing  is  impacted  by  the  ability7  to 
abstract  data  as  well  as  by  the  language  itself.  Therefore,  both  a  data  model  and  the  query 
language  need  to  be  considered  in  the  context  of  query  processing. 

Spatial  data  are  complex  data  types.  Spatially  related  data  have  two  components: 
(1)  location  and  its  extensions,  and  (2)  attribute  data  describing  non-spatial  properties. 
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Spaual  data  can  be  referenced  by  specific  geographic  space.  Attributes  of  spatial 
data  provide  information  regarding  those  characteristics  that  are  not  necessarily 
spatial;  they  provide  a  detailed  description  of  die  spatial  data  by  supplying  values  for 
its  properties.  In  the  typical  environment  described  in  this  paper,  spatial  data  from 
multiple  sources  and  formats  (i.e.  coverages,  scales,  etc.)  are  accessed  and  integrated  to 
provide  query  responses.  The  object-oriented  query  approach  we  will  describe  is 
well-suited  for  providing  the  mechanisms  needed  in  such  a  diverse  heterogeneous 
environment. 

Egenhofer  defines  two  fundamental  concepts  of  spatial  data  abstraction:  the  view  of 
spatial  data  in  the  form  of  complex  objects,  and  the  interaction  with  spatial  objects 
through  pertinent  operations  [1].  The  geographic  communin’  has  accepted  the  object- 
oriented  modeling  of  spatial  data  as  a  natural  paradigm  for  geographic  objects  [2],  The 
data  encapsulation  property  of  an  object-onented  model  allows  an  aggregate  of  objects 
to  be  considered  as  a  unique,  comprehensive  object.  Thus,  there  is  a  greater  ability  to 
localize  a  query  to  a  subset  of  objects.  An  obiect  model  is  simplified  compared  to  the 
relauonal  model  since  join  conditions  need  not  be  specified.  Another  difference  from 
the  relational  data  model  is  that  pertinent  behaviors  are  encapsulated  with  the  data. 
Therefore,  an  object  is  a  completely  self-contained  module  of  state  and  behavior.  Direct 
queries  and  subsequently  directed  retrieval  can  be  performed  on  objects.  Query  process¬ 
ing  of  an  object  extends  the  traditional  extraction  of  state  information  to  include 
behaviors  of  the  spatial  object.  Thus,  query  processing  of  objects  involves  more  than 
a  simple  retrieval  of  stored  information;  it  may  initiate  some  computation  as  a  behavior 
of  an  object  and  yield  a  query  response  as  a  result. 

This  paper  will  present  a  spatial  query  description  to  support  spatial  query  processing 
of  an  object-oriented  spatial  data  model.  This  formal  method  provides  the  basis  for 
articulating  the  issues  and  approaches  in  querying  multiple  sources  of  spatial  data. 
Section  2  introduces  issues  and  background  on  the  integration  of  heterogeneous  spatial 
data  types  as  related  to  the  query  model,  followed  by  Section  3  on  discussion  of  the 
spatial  query  language  issues.  Concerns  such  as  the  use  of  an  object-oriented  methodo¬ 
logy  and  its  implications  for  querying  are  discussed.  Section  4  addresses  a  generalized 
formal  descripuon  of  the  object-onented  spaual  query  model  (oo-sqm).  Section  5  pro¬ 
vides  insight  into  the  implementation  issues  of  the  spatial  query  model  for  a  geospatial 
information  distribution  system  (GIDS),  followed  by  our  conclusions  and  directions  for 
future  work  in  Section  6. 


2.  Heterogeneous  Data  Integration  and  Query  Model 

Three  tvpes  of  query  processing  must  be  considered  for  spatial  data:  spatial,  attribute, 
and  a  combination  of  spatial  and  attribute  querying.  A  spatial  data  search  always  requires 
some  reference  to  the  location  or  an  area  of  interest  (. AOI ).  A  spatial  query  model 
must  consider  access  patterns  of  spatial  data.  In  general,  data  access  begins 
from  a  general  neighborhood  search  to  a  specific  location.  From  such  general  access 
patterns,  both  hierarchical  and  geographic  clustering  mechanisms  or  indexing  yield 
favorable  results.  A  comprehensive  study  on  different  spatial  access  mechanisms  is 
presented  in  Gaede  [3]  with  a  detailed  study  on  spatial  data  structures  presented  in 
Samet  [4]. 
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Figure  1.  An  example  of  meta-datn  and  spatial  data  attributes 


The  non-spatial  details  ot  spatial  data  are  considered  as  a  part  of  attribute-level 
querying.  Usually,  a  distinction  is  made  between  data  regarding  the  spatial  information 
and  data  related  to  the  actual  descripdon  ol  spatial  data  (metadata),  as  shown  in  Figure  1. 
In  this  figure,  an  example  of  one  data  type  used  in  the  GIDS,  namely  vector  product 
format  (\TF)  data,  is  used  to  show  the  difference  between  metadata  and  feature 
attributes.  Information  in  italics  arc  metadata  that  describe  the  form  of  the  actual  spatial 
data  contained  in  the  data  set.  The  italicized  and  underlined  description  provides  specific 
information  regarding  the  characteristics  and  properties  of  the  spatial  data;  these  are  the 
attributes  of  the  spatial  data.  'However,  in  the  heterogeneous  data  integration  model, 
both  the  metadata  and  spatial  object  description  are  considered  as  attributes.  For 
example,  in  the  figure  an  instance  of  the  class  VPFLibrarv  contains  metadata  relative  to 
the  scale  at  which  the  data  were  collected,  as  well  as  the  securin'  classification  of  the 
contained  data,  while  the  feature  class  VMALakeresa  contains  an  attribute  hydrological 
category  that  is  not  considered  to  be  metadata.  Due  to  the  different  level  ot  data 
abstraction  for  the  meta-data  and  the  spaual  data,  a  knowledge-based  system  that 
governs  the  attribute  evaluation,  especially  for  the  metadata,  is  one  consideration  to  be 
made.  An  attribute-based  query  may  be  as  simple  as  a  question  on  one  of  the  attribute 
types  or  a  value,  or  it  can  be  as  complicated  as  a  sum  of  some  products  of  attribute 
tvpe  or  values. 

In  the  real  world,  there  is  only  one  representation  of  the  spatial  object  at  a  given 
location.  However,  because  of  multiple  data  collections  and  varying  scopes  and  inten¬ 
tions  of  data  users,  many  different  representations  of  the  real  world  may  exist  in  a  digital 
environment.  In  other  words,  based  on  the  collection  criteria,  and  specifically  the 
resolution  at  which  the  data  were  collected,  a  single  real-world  object  may  be  modeled 
differently  in  different  data  sets.  For  example,  at  a  high  resolution,  a  building  may  be 
represented  as  a  polygon  of  its  footprint.  But,  at  a  low  resolution,  the  same  building  may 
be  represented  as  a  point  feature.  Thus,  in  spatial  data  modeling,  the  resolution  and  scale 
at  which  the  data  are  represented  becomes  important,  especially  in  a  multi-data  source 
and  format  environment. 

Spatial  data  are  closely  tied  to  their  graphical  representations.  Therefore,  when 
multiple  data  types  from  data  sources  are  requested  to  be  queried  simultaneously,  the 
integrity  on  the  similarity  of  graphical  representation  must  be  insured.  This  can  be 
achieved  bv  allowing  the  user  to  specify  a  scale.  Even  though  there  may  be  spatial 
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Figure  2.  .^O/and  related  datasets 


objects  that  meet  the  attribute  criteria,  the  collection  criteria  may  have  been  different. 
The  scale  issue  can  be  considered  as  a  pan  of  attributes  if  some  level  of  a  conflation 
process  as  discussed  in  Cobb  [5]  is  implemented.  Until  a  knowledge-based  system  that 
can  handle  the  feature  matching  and  feature  representation  at  multiple  scales  by  applying 
generalization  is  developed,  the  inclusion  ot  scale  in  a  query  model  implementation  is 
not  feasible.  However,  the  significance  and  relevance  of  scale  to  the  issue  of  spatial 
queries  is  undeniable,  and  is  thus  included  in  the  theoretical  model  and  immediately 
following  discussion.  It  follows  from  this  presentation,  then,  that  a  spatial  query  is 
a  function  of  at  minimum  three  criteria:  geo-location  or  an  area-of-interest  (AOI),  scale, 
and  attributes.  Therefore,  we  define  a  spatial  query  model  as 

SOM  =  f[AOI,  scale ,  I  attributes ]. 

Due  to  the  multi-resolution  issue  in  multi-datasource  databases,  the  outcome  of  the 
SOM  should  be  a  conjunction  of  the  three  parameters:  AOI a  scale  a  Z  attributes .  This 
integration  of  queries  based  on.  SOI  is  illustrated  in  Figure  2.  This  figure  shows  the  area 
selected  by  the  rectangle  on  the  map  (Persian  Gulf  region),  as  well  as  the  active  datasets 
for  that  region,  i.e.  all  member  datasets  that  contain  data  in  any  format  for  the  region. 
The  figure  shows  several  such  datasets,  including  VPF  (DNC01  and  02)  and  Generic 
Sensor  Format  data.  The  AOI  is  a  critical  component  of  the  query  model  that  is  used  at 
a  high  level  to  determine  the  datasets  over  which  the  remaining  query  constraints  are 
evaluated. 

Within  each  parameter,  such  as  the  attribute  constraints,  a  combination  of  logical 
operators  (AND,  OR,  and  NOT)  can  be  used.  A  query  formulated  in  this  manner  can 
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have  a  number  of  constraint  issues  relative  to  the  combinations  of  parameters.  For 
example,  what  should  be  the  priority  of  evaluation  of  the  query  expression  with  respect 
to  the  three  operators?  This  ordering  can  clearly  affect  retrieval  efficiency  and 
require  accessing  of  more  or  fewer  data  sources  depending  upon  the  evaluation 
order. 

Another  consideration  might  be  an  absolute  requirement  on  one  or  more  of  the  query 
parameters.  For  instance,  if  a  given  scale  is  set  as  an  absolute  requirement,  this  could 
dictate  a  query  failure  unless  the  *-10/  could  be  changed,  e.g.  reduced.  If,  how¬ 
ever,  a  range  of  scale  is  specified,  a  careful  evaluation  of  the  scale  range  and  the 
attribute  constraints  needs  to  be  considered.  A  range  of  scale  can  be  considered  as 
a  disjunction  of  each  scale  {scale j  V  scaled  Then,  using  A  as  AND  operator  and  v  as  OR 
operator, 


(AOI  a  (scale;  v  scale})  a  I  attributes)  =  (.40/  a  scale,  A  I  attributes) 

v  {A 01  a  scakj  A  I  attributes). 

In  this  context  of  graphical  representauon,  we  may  have  integrated  issues  of  scale  and 
resolution.  There  may  be  some  spatial  features,  fn  that  are  collected  at  scale;  that  meet  the 
AOI  and  attribute  constraints.  There  are  some  other  spatial  features,  f;,  that  are  collected 
at  scakj  that  also  meet  the  AOI  and  attribute  constraints.  Due  to  the  differences  in  the 
scale,  integrated  display  of  both  feature  sets  w’ould  produce  a  mis-representation  of  the 
spatial  features.  Thus,  a  requirement  of  the  SOM  is  that  query  results  must  contain  spatial 
features  at  the  same  scale.  Applying  a  generalization  function  that  maps  those  spatial 
data  collected  at  a  high  resolution  to  the  low  resolution  enables  a  consistent  graphical 
representation  of  both  spatial  feature  sets.  Assuming  scakj  is  at  a  higher  resolution  than 
scale; ,  the  SOM  result  is 


{AOI  a  (scakj  v  scakj)  A  I  attributes) 

=  (AOI  A  scakj /\  T  attributes)  a  Gen ,  —  ; (AOI  a  scale,  A  Z  attributes). 

The  generalization  function  will  only  modify  the  spanal  component  of  the  spatial 
object.  The  attributes  of  the  spatial  objects  will  not  be  affected.  The  non-spatial 
description  of  the  spatial  object  will  be  at  a  higher  resolution.  However,  this  does  not 
violate  the  integrity  of  the  geographical  information.  The  implication  is  that 
the  generalization  function  wall  be  applied  only  when  the  data  are  requested  to  be 
displayed. 

Likewise,  a  class  hierarchy  that  exists  in  the  object-oriented  data  model  would  allow? 
attribute  abstraction  from  the  superclass  level.  Due  to  the  inheritance  property’  of  the 
object-oriented  data  model,  the  attributes  or  the  properties  of  the  superclass  are  also 
defined  for  subclasses. 

In  summary’,  an  SOM  as  defined  provides  a  way  to  uniquely  abstract  spatial  features  by 
specifying  the  three  minimum  search  criteria.  An  AND  function  is  imposed  among  the 
three  constraints.  However,  in  the  case  where  a  range  of  scale  is  specified,  then  all  the 
spatial  features  that  meet  the  scale  range  and  the  attribute  constraints  are  retrieved.  In 
this  case,  though,  a  generalization  function  must  be  applied  to  the  spatial  features 
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collected  at  the  higher  resolution  to  display  features  of  different  scales  in  a  consistent 
manner. 


3.  Spatial  Query  Language  Issues 
3.1.  Overview  of  Spatial  Query  Approaches 

In  this  section,  we  survey  various  spatial  query  approaches  and  then  provide  an 
assessment  related  to  our  querying  approach.  Spatial  query  languages  are  by  narure  more 
complex  than  their  alphanumeric  counterparts.  Additional  requirements  for  spatial 
querv  languages,  including  the  ability’  to  handle  queries  containing  both  spatial  and 
non-spatial  selection  criteria,  as  well  as  providing  appropriate  contextual  graphical 
displav  of  spatial  query  results  and  intuitive  user  interlace  functionality  (e.g.  point-and- 
click),  greadv  complicate  the  design  and  implementation  of  spatial  query  languages. 
Applications  of  spatial  query  languages  van*  widely  and  include  areas  such  as  geographi¬ 
cal  information  svstems  (GIS),  pictorial  and  image  and  general  multimedia  databases  and 
CAD/CAM  svstems.  There  are  several  ways  to  view  the  various  approaches  to  spatial 
quen*  languages.  In  particular,  here  a  very  important  issue  is  the  distinction  between 
languages  based  on  relational  vs.  object-oriented  models.  Also  to  be  considered  is  the 
relevant  domain,  in  particular  whether  the  focus  is  strictly  on  spnual  data  or  more 
generally  mulumedia,  often  including  images,  audio,  video  and  subsuming  some  aspects 
of  spatial  data  also. 

First,  for  relational  model-based  querying,  we  must  consider  those  languages  based 
on  SQL.  Spatial  SQL  [6]  consists  of  two  separable  components:  a  query  language  for 
information  retrieval,  and  a  presentation  language  for  directing  the  display  of  spatial 
data.  This  separation  of  querying  was  intended  to  reduce  the  complexity  of  the  quen* 
structure  for  users.  The  graphical  presentation  language  (GPL)  is  a  superset  of  the  quen* 
language  spatial  SQL,  which  is  an  extension  of  SQL  with  spaual  data  handling 
capabilities.  It  provides  a  new  domain — spaual — with  specializations  for  points,  lines, 
area  and  3-D  components.  Spatial  operators  and  spaual  and  topological  relationships  are 
supported  and  can  be  used  in  predicates  of  the  WHERE  clause  of  an  SQL  statement. 
PSQL  [7]  was  developed  for  pictorial  database  systems.  Points,  lines  and  regions 
comprise  the  pictorial  domain  of  PSQL.  The  language  was  designed  to  be  flexible, 
allowing  for  user-defined  operators  and  abstract  data  types,  including  that  of  domains. 
Queries  in  PSQL  are  translated  to  SQL  queries  by  a  preprocessor.  It  supports  topo¬ 
logical  relationships  such  as  cover,  not  cover,  overlap,  etc. 

A  number  of  the  quen*  languages  that  have  been  developed  are  not  SQL-based  but 
ad  hoc  quen*  languages.  Some  of  these  can  be  classified  as  visual  spatial  quen*  languages, 
a  natural  interface  for  spatial  data.  A  query  language  that  allows  users  to  ‘draw’  a  query  is 
the  language  Cigales  [8].  In  this  the  user  must  first  select  the  type  of  spatial  relation  they 
are  interested  in  and  then  draw  a  query*  sketch  on  which  retrieval  is  based.  Another 
approach  uses  an  iconic  query  language  [9].  Here  the  user  chooses  spatial  relations  from 
a  predefined  iconicailv  represented  set  and  formulates  the  quen7  based  on  these.  To  keep 
the  querying  more  user-friendly,  only  a  relatively  small  subset  of  all  topological  relations 
were  formulated  by  icons.  Query-by- visual-example  [10]  is  a  visual  spatial  query 
language  based  on  an  extension  of  query-by-example  [11].  A  grid  is  utilized  to  form 
a  template  of  scenes,  primarily  specifying  cardinal  directions.  The  grid  facilitates 
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directional  specifications,  but  makes  it  harder  to  specify  approximate  distances  and 
topological  relations  independent  of  direction  specifications.  Another  approach  permits 
a  sketch  that  is  somewhat  less  restrictive  in  its  retrieval  matching  by  utilizing  inference 
mechanisms  for  geometric  variations  in  the  sketch.  This  is  the  spatial-query-by- 
skctch  [12]  that  is  based  on  qualitative  aspects  reflecting  human  behavior  in  which  the 
topologv  is  of  prime  importance,  followed  by  distance  measures  to  refine  the  degree  of 
match. 

As  a  preface  to  discussion  of  spatial  object  querying  we  first  describe  the  state  of 
development  of  object  query  language  (OQL)  as  specified  for  querying  the  object  data 
management  gToup  (ODMG)  object  model  [lo].  OQL  defines  an  orthogonal  expression 
language,  in  the  sense  that  all  operators  can  be  composed  if  the  types  of  operands  are 
valid.  OQL  has  one  basic  statement  for  retrieval  of  intormation  based  on  the  SQL 
language  syntax  of  Sclcct-FromAXTicre.  In  this  it  is  quite  similar  to  SQL92,  with 
object-oriented  extensions  such  as  complex  objects,  object  identity,  path  expressions, 
polymorphism  and  operator  invocation. 

PICQUERY  +  [14]  was  designed  for  object-oriented,  multimedia  knowledge-based 
systems.  It  supports  fuzzy  matciting  and  temporal  and  object  evolutionary  event 
querying  and  was  developed  to  meet  needs  of  medical  imaging  requirements.  The 
relationship  terminology  used  by  PICQUERY +  is  somewhat  different  from  that 
common  in  geographical  domains  using  terms  such  as  left  ot,  above,  and  so  forth. 
MOQL  [15]  provides  a  general  query  capability  for  multiple  media  and  applications  and 
it  includes  constructs  to  capture  temporal  and  spatial  relationships  in  multimedia  data. 
Extensions  to  the  WHERE  clause  of  OQL  in  the  form  of  new  predicates  are  provided 
for  temporal  and  spatial  expressions  and  a  ‘contains’  predicate. 

3.2.  Evaluation  of  GIDS  Query  Interface  and  Other  Approaches 

The  main  characteristic  of  our  spatial  querying  approach  is  that  it  has  a  data-driven 
graphical  interface  that  is  dealing  directly  with  an  object-oriented  database.  Now  we  shall 
discuss  the  relationship  of  our  approach  to  the  various  types  of  querying  languages  for 
spatial  data  that  have  been  overviewed  above. 

Several  aspects  to  SQL-based  querying  schemes  contrast  with  our  specific  approach. 
A  basic  difference  is  that  SQL  is  fundamentally  relational]}'  based  and  so  mismatched  to 
our  totally  object-oriented  Smalltalk-based  database.  While  it  is  true  that  SQL3  supports 
objects  as  ADTs  which  describe  new  types  for  attributes,  this  is  in  an  object-relational 
context  in  which  relations  are  still  central  to  the  SQL3  view  of  data  [16].  Another 
problem  with  SQL  is  that  it  and  its  extended  versions  are  primarily  not  spatially  oriented. 
They  require  a  user  to  refer  to  geo-referenced  data  via  an  alphanumeric  language  [6]. 
Such  spatial  querving  requires  the  user  to  focus  on  language  syntax  and  to  often  translate 
a  spatial  image  in  their  mind  into  a  non-spatial  language.  Thus,  in  this  sense  our 
data-driven  graphic  query  interface  is  more  comparable  to  the  graphic  query  approaches 
described  in  the  last  section.  The  sequence  of  Figure  3(a)-(e)  that  follows  demonstrates 
the  benefits  of  the  data-driven  query. 

First,  a  sensor  that  collects  time-varying  information,  shown  as  T  on  the  map  display, 
is  selected.  Upon  selecting  the  sensor,  its  available  attributes  along  with  related  multi- 
media  data  and  nearby  features  are  displayed.  The  Temporal  Query  button  is  enabled  to 
indicate  that  there  are  temporal  information  available  with  the  selected  anemometer. 
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Figure  3.  Data-driven  query  example 


Upon  clicking  the  Temporal  Query  button,  a  Temporal  Query  of  Feature  window  is 
displayed.  This  display  allows  the  user  to  select  the  time  of  interest.  The  year,  month,  day 
and  hour  selection  will  be  based  on  the  actual  data  availability  for  the  anemometer. 
Selection  can  stop  at  any  level,  i.e.  year,  month,  day,  or  hour.  In  this  case,  after  selecting 
the  hour,  the  Show  Data  button  is  selected. 
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Figure  3.  Continued 


The  anemometer  during  the  selected  time  has  collected  four  different  types  of 
information,  WaterLevel,  WaveHeight,  WavePeriod,  and  timeStamp.  Upon  selecting 
WaterLevel  as  the  information  of  interest,  another  time  selection  is  provided.  This  time 
selection  is  in  minutes.  Upon  selecting  the  time  period  of  interest,  any  of  the  three 
options  can  be  selected  to  view  the  data,  statistics,  values,  and  a  plot. 
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Figure  3.  Continued 


Upon  clicking  I  allies  to  view  the  data,  a  list  of  ail  the  water  level  information  at  the 
time  period  is  displaved.  Upon  selecting  one  of  the  time  instances,  images  that  are 
related  to  this  time  stamp  are  displayed.  A  set  of  video  cameras  are  positioned  at  the  near 
locations  taking  video  shots  of  the  shorelines.  For  the  selected  time  instance,  there  are 
no  video  shots  available.  However,  there  is  a  set  of  bathymetric  data  collected  at  the 
same  time  over  at  the  same  location.  By  selecting  the  Bathymetric  Plot,  a  real  time 
displav  of  the  bathy  plot  is  shown. 

In  this  example,  the  user  did  not  need  to  have  any  prior  knowledge  of  the  data.  The 
querying  process  was  driven  by  the  available  data,  which  were  displayed  to  the  user  at 
each  step. 

As  a  conclusion  to  our  discussion  on  spatial  query  languages,  we  consider  those 
query  approaches  based  on  OQL  which  are  indeed  object-oriented.  Although  these 
provide  an  object-oriented  stance,  they  still  differ  from  our  approach  by  being 
structured  syntactically  (SQL-like)  and  not  directly  oriented  toward  a  data-driven 
approach.  Indeed,  one  of  the  key  considerations  behind  the  OQL  design  was 
the  compatibilitv  with  the  SQL  [13].  It  is  understood  that  the  OQL  extends  SQL  to 
incorporate  those  unique  characteristics  of  objects  [17].  A  system  which  has  more  in 
common  with  our  approach  is  QUIVER  [18] — a  graphical  interface  to  an  object- 
oriented  database,  02.  A  formal  user  evaluation  of  this  system  indicated  that  it  was 
more  convenient  for  query  formulation  than  the  direct  use  of  OQL.  The  major 
differences  from  our  approach  are  that  QUIVER  is  a  graph-based  language  for  general 
database  querying  and  not  specifically  meant  for  spatial  databases,  and  that  it  also 
translates  queries  into  OQL,  thus  generating  some  overhead  that  our  system  does  not 


incur. 
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4.  Spatial  Object  Query  Formalism 

As  in  our  general  discussion  of  object-oriented  queries,  the  first  issue  to  be  addressed  is 
the  object  data  semantics.  Let  us  use  the  notation  Z  as  the  universe  of  spatial  objects.  It 
consists  of  all  objects  that  occupy  some  geographic  location. 

In  defining  ‘geographic  location’,  we  recognize  the  fact  that  many  spatial  data 
modeling  strategies  rely  upon  approximate  representations  of  spatial  objects  both  for 
computational  efficiency  issues  as  well  as  the  simplification  of  logical  modeling  strat¬ 
egies.  One  of  the  widely  used  approximations  is  the  minimum  bounding  rectangle 
(MBR).  Our  work  as  well  as  that  of  many  others  [19-21],  relies  upon  the  use  of  MBRs  as 
approximations  of  the  geometry  of  spatial  objects.  The  use  of  MBRs  in  geographic 
databases  is  widely  practiced  as  an  efficient  way  of  locating  and  accessing  objects  in 
space  [21].  In  addition,  numerous  spatial  data  structures  and  indexing  techniques  have 
been  developed  that  exploit  die  computationally  efficient  representation  of  spatial 
objects  through  the  use  of  MBRs  [21,4].  Another  advantage  to  the  use  of  an  ^IBR 
representation  is  that  all  objects  can  be  handled  at  the  same  level  of  dimensional¬ 
ity _ that  is,  point,  line  and  area  features  are  all  represented  as  2-D  objects  across  which 

operations  can  be  uniformly  applied. 

Of  course,  the  use  of  MBRs  is  inherendv  problematic  to  some  degree  because  an 
MBR  is  an  approximation  of  an  object’s  true  geometry.  The  best  way  to  deal  with  this  is 
to  consider  that  MBRs  are  over-estimations  of  the  object’s  extent  and  their  impact  on 
queries  is  to  produce  ‘false  drops’  (objects  whose  MBR  extent,  but  not  its  actual 
geometry,  satisfies  the  query).  If  a  query  refinement  is  required,  the  underhing  actual 
vector  representation  can  be  then  used  to  eliminate  the  false  drops. 

The  MBR  is  used  as  a  representation  of  a  feature’s  geometric  or  locational  aspect.  The 
abstract  object  structure  we  will  use  here  is 


O  =  [O.Ioc,  O.att)  VOeT 

O./or  is  the  component  of  the  object  that  represents  the  method  to  determine  the 
location  of  the  bounding  box.  The  relevant  VPF  non-spatial  attributes  for  the  particular 
type  of  object  are  represented  by  the  component  O.ati. 

In  formulating  a  general  spatial  quern  description,  we  first  consider  the  use  of 
spatial  predicates  and  attribute  predicates.  Three  different  spatial  predicates  can  be 
given:  (1)  spatial  predicate  over  an  .-107,  (2)  topological,  and  (3)  geometric.  The 
topological  and  geometric  predicates  allow  querying  of  spatial  relationships  with  other 
spatial  data  in  the  AOI.  The  first  aspect  of  our  specific  quern  environment  that  we  must 
address  is  the  AOI,  as  it  forms  the  context  for  all  other  discussions.  The  AOI  is  the 
general  region  of  concern  to  the  user  and  could  potentially  be  specified  in  a  number  of 
ways: 


A  =  {OeZ\Pana(0.loc)}. 

Since  there  are  a  number  of  specifications  for  the  area  we  represent  this  by  the  general 
predicate  Parva.  We  want  to  allow  for  the  possibility  of  objects  that  may  satisfy  a  quern 
but  whose  position  (bounding  box)  docs  not  entirely  lie  in  the  AOI ,  similar  to  the  buffer 
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specification  used  in  other  systems  [23],  So  we  define 

A*=  {0 €  Z\0 e QR  a  (0 eA  v  (O./ocnA)} 

where  OR  is  the  set  of  objects  retrieved  bv  the  query. 

Now  that  we  have  specified  an  AOI,  we  can  formulate  the  two  basic  queries 
allowed-attribute  queries  and  spadal  queries: 

attribute  query:  OR  =  {0  eA*\(0.a/tk  =  /</.)} 

spatial  query:  OR  =  {06.4*17^(0,  O'),  0’  6.4}. 

The  attribute  query  is  interpreted  as:  Find  all  objects  in  the  AOI  such  that  the  attributes 
match  the  corresponding  values  in  the  query.  The  spatial  query  can  be  phrased  as:  Find 
all  the  objects  that  have  the  specified  spatial  relation  to  the  selected  object  O .  The 
specific  forms  ot  spatial  predicates  allowed  will  be  discussed  shortly. 

W'e  have  described  the  form  of  individual  queries  so  far,  but  it  is  also  necessary  to 
allow  a  sequence  of  nested  queries,  Ou  ft,  ...Oj.  This  is  possible  since  the  result  of 
a  query  Q,  is  a  set  of  objects 

QR,  where  OR,  c  A  (or  A*). 

The  three  object  classes  use  the  same  spatial  representation  (bounding  boxes)  and  so 
the  quen  operations  are  closed.  Thus,  we  have  for  nested  queries 


attribute  query:  QR  =  { 0  6 QR,\  ( 0.at1k  =  } 

spatial  quen-:  OR  =  { 0  6.4  *|i%/W(0,  O'),  O'  6  OR,) . 

This  simply  means  that  the  quen-  can  refer  to  the  set  of  objects  obtained  bv  a  previous 
quen-,  i.e.  the  set  of  objects  retrieved  can  be  based  on  the  result  OR,  of  a  previous 
query  Or  ~~ 

In  order  to  reflect  the  possibility  of  exactly  which  objects  are  returned  by  a  spatial 
quen-  when  allowing  sets  of  objects  to  be  related,  we  formulate  the  following  quen- 
result:  s  M  ' 


OR  =  { 0  v  0'\P9a„al(0,  O’),  OeOR„  O'  e QR,}. 

The  above  query  formulation  selects  the  returned  result  as  only  the  objects  0  or  O'.  As 
a  result,  some  information  is  lost  regarding  the  specific  objects  in  the  other  set  to  which 
the  resulting  objects  are  related.  OP  is  defined  as  an  extension  to  QR  that  returns  both 
the  objects  O  and  O': 


QP=  {(0,  0)  \P^„a/  ( O ,  O'),  O  eOR„  0‘  e  OR,}. 
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OP  is  of  course  a  different  form  of  result  set  (object  pairs)  than  the  ORs  that  have  only 
objects.  That  is 


OP  ^  ORjXORj. 

Since  OP  is  a  set  of  pairs  of  objects,  the  simple  queries  above  cannot  be  nested  with 
results  from  OP.  That  is,  we  do  not  maintain  closure  of  query  nesting  if  a  OP  result  were 
used. 

The  types  of  predicates  we  will  consider  are  geometric  and  topological  predicates. 
The  geometric  predicates  use  either  distance  or  positional  functions: 

P^ctnc (0>  O')  =  (G(OJoc,  0’ Joe)  (<,=,>) A) 

where  C  is  a  distance  function  and  A"  is  some  distance  value  or  just 

Pffamenk  (0,  O')  -  (67  (OJoC,  O'JoC )) 

where  C  is  a  positional  function  which  is  Boolean. 

For  distance  functions  based  on  the  bounding  box,  the  distance  can  be  based  either 
on  a  ccntroid-to-ccntroid  or  boundary-to-boundary  measure.  For  the  latter,  the  seman¬ 
tics  of  the  query  might  require  either  nearest  or  farthest  boundary  measures. 

The  topological  predicate  describes  relationships  among  neighborhoods.  Egcnhofcr 
[2]  defines  nine  relationships  that  completely  describe  topological  relationships  among 
spatial  entities.  For  die  spatial  data,  a  strong  assumption  is  made  that  an. 40/ has  been 
selected  to  localize  a  search  region.  A  boolean  value  is  used  to  determine  spatial 
relationships  among  spatial  entities.  AU  spatial  entities  that  are  evaluated  tn/e  with  respect 
to  the  topological  relationship  arc  added  as  a  query  result.  The  topological  predicates  we 
can  sped  tv  are  again  based  on  OJoc  as  formulated  from  the  bounding  box: 

Pineal  (0,  O')  -  ( T(OJoe ,  O' Joe)) 

where  T  is  a  Boolean  function  representing  typical  topological  functions  that  can  be 
used,  such  as  totally  contains,  intersecting,  and  non-intersecting. 


5.  Implementation  of  Spatial  Query  Framework 

The  GIDS  serves  as  an  object-oriented  database  sewer  that  allows  objects  to  be 
accessed  via  a  Java  applet.  More  detailed  discussion  on  the  GIDS  can  be  found  in  Chung 
[24].  GIDS  hosts  several  different  data  types:  vector,  raster,  text,  multi-media,  industry 
standard  images,  and  temporal. 

Raster,  text,  and  a  large  portion  of  the  vector  data  are  based  on  the  National  Imagery 
and  Mapping  Agency  (NIMA)  standards.  A  triad  of  mapping  data,  namely  vector 
product  format  (VPF)  [25],  raster  product  format  (RPF)  [26],  and  text  product  standard 
(TPS)  [27]  are  disseminated  by  NIMA  to  military  users.  Each  format  is  produced  to 
sene  a  specific  purpose  to  military  users.  Vector  data  are  used  for  analysis  and  for 
providing  more  control  over  display,  i.e.  display  can  be  decluttered  compared  to  raster 
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display.  On  the  other  hand,  the  raster  data  are  used  primarily  for  situational  orientation. 
Text  products  provide  textual  dcscripdons  concerning  hydrographic  and  aeronautical 
information. 

VPF  is  designed  to  support  large  geographic  databases  using  flat  files  or  a  relational 
structure.  VPF  specifies  a  geo-relational  framework  based  on  a  vector  data  model.  This 
format  has  the  most  complex  specification  among  die  triad.  Most  VPF  products  are 
digitized  from  scanned-in  paper  maps,  air  photos,  satellite  data,  etc.  The  uses  of  VPF 
include:  (1)  distribution  of  cartographic  data  products,  (2)  direct  querv  capability,  and 
(3)  data  quality  monitoring  and  recording.  Three  basic  types  of  information  specify  VPF 
data:  non-spatial  properties  (attributes),  geometric  properties  (coordinates),  and 
topological  properties  (connectivity  and  contiguity  relationships).  A  spatial  entity  in  VPF 
is  at  a  feature  level.  Four  types  of  features  are  used:  point,  Line,  area,  and  text.  An 
overview  of  \TF  and  topology  is  discussed  in  Chung  [28].  A  discussion  of  conversion 
from  flat  file  structure  to  object-oriented  format  for  VPF  is  provided  in  Arctur  [29]. 

RPF  is  a  standard  database  structure  for  arrays  of  pixel  values.  Both  compressed  and 
uncompressed  forms  of  raster  data  are  available.  RPF  data  is  generated  from  scanned 
charts  as  well  as  SPOT  imagery’.  An  RPF  image  is  defined  bv  frames.  Each  frame  is 
sub-divided  into  subframes  and  each  frame  and  subframe  represents  a  specific  geo¬ 
graphic  region  covering  an  area  specified  by  four  corners  of  a  rectangular  boundary. 
Every  subframe  has  a  pixel  array  that  provides  color  index,  value  or  intensity  for  each 
corresponding  pixel  locauon.  Figure  4  shows  an  example  of  the  display  of  both  \TF  and 
RPF  data  types  for  the  Persian  Gulf  region.  Vector  plots  of  shorelines,  islands,  rivers 
and  currents  are  also  shown  in  this  example.  We  emphasize  here  that  data  are  not 
integrated  merely  at  a  visual  level;  all  data  displayed,  regardless  of  format,  can  be  queried 
regarding  attributes,  metadata,  etc. 
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Figure  4.  Integrated  VPF  and  RPF  data 
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TPS  is  the  third  NL\IA  format  that  provides  digital  textual  description  of  hardcopy 
publications  published  by  NIMA,  e.g.  American  Practical  Sari  gat  or.  The  format  of  TPS  is 
paragraphs  with  associated  figures  and  tables  using  standard  generalized  markup  language 
(SGML).  The  content  of  TPS  is  used  as  a  secondary  information  source  to  augment  paper 
or  digital  charts.  Even*  TPS  product  has  an  indexed  gazetteer  that  provides  relationships 
among  name,  geo-location  and  paragraph  location  within  the  document.  The  indexed 
Lrazetteer  is  the  main  link  that  is  used  in  generating  geo- referenced  TPS  object  data. 

The  G1DS  can  currently  host  video  tiles  (mpg,  avi,  and  mov),  audio  files  (wav),  and 
industry  standard  image  (jpg  and  gif;  formats.  Each  of  these  multi-media  data  sources  is 
referenced  to  a  certain  spatial  region.  For  example,  GIDS  has  an  audio  clip  of  Hurricane 
Fran  that  hit  North  Carolina  in  1996  that  is  referenced  to  the  spatial  location  of  North 
Carolina.  Figure  5  shows  the  use  of  ‘T  symbols  on  a  vector  map  that  represent  the 
availability  of  temporally  registered  data  for  particular  geographic  locations.  Clicking  on 
one  of  the  symbols  brings  up  a  wreb  browser  in  which  the  available  text  is  displayed.  This 
tigure  also  shows  an  avi  file  of  the  currents  for  the  area.  Figure  6  show’s  a  variety  of 
multi-media  data,  including  coastal  sun  eys  and  space  shuttle  imagery. 

Temporal  data  persisted  in  GIDS  are  collected  by  the  Field  Research  Facility  in  Duck, 
North  Carolina.  The  temporal  data  spans  over  almost  two  decades,  from  1980  to  1999. 
Each  time-varving  information  set  u*as  collected  by  sensors  that  record  changing  waves, 
winds,  tides,  and  currents  [wwvw’. frf.usace.army.mil].  These  temporal  data  are  referenced 
spatially  by  the  sensor's  location.  Also,  each  sensor  has  a  reference  to  the  time-varying 
information  it  records. 

All  the  data  that  GIDS  currently  has  persisted  have  spatial  reference.  Each  of  the  data 
sets  has  a  spatial  access  mechanism  embedded  in  its  class  hierarchy.  Two  factors 
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Figure  6.  Multi-media  display 


determined  the  design  of  the  spatial  access  mechanism  in  GIDS:  balancing  the  tree,  and 
ease  of  access.  In  other  words,  for  the  multi-media  data  sets,  a  global  spatial  access 
mechanism  is  used  to  spatially  index  the  multi-media  data.  Due  to  the  complexity  in  the 
VPF  data  organization  and  the  volume  of  data,  a  spatial  access  mechanism  is  embedded 
within  each  thematic  layer  [30]. 

In  dealing  with  multiple  data  formats  and  data  types,  one  of  the  primary  issues  deals  with 
the  integration.  In  GIDS,  spatial  data  integration  is  achieved  by  the  use  of  a  spatial  access 
mechanism  as  shown  in  Figure  7.  For  each  data  type,  its  data  structure  is  implemented 
according  to  the  specification.  However,  each  data  type  class  hierarchy  always  contains  the 
spatial  access  mechanism  as  one  of  its  instance  variables  to  index  spatial  objects. 


5.1.  Spatial  Query  Over  AO  I 

An  AOI  specification  can  be  created  in  one  of  three  ways  (refer  back  to  Figure  2  for 
illustration):  (1)  specify  origin  and  corner  location  (a  diagonal  of  a  rectangle),  (2)  specify 
a  center  location  and  a  radius,  and  (3)  specify  a  location  with  horizontal  and  vertical 
distance  from  that  location.  Using  one  of  the  three  methods  of  specifying  an  A  07,  the 
spatial  query  is  evaluated  by  using  a  quadtree.  The  following  method  is  executed  to 
perform  the  spatial  query: 

quad  retumSetOflntersectingFeatures:  aRect. 

A  quad  is  an  instance  of  Spatial  Data  Manager  that  manages  the  spatial  index,  namely 
a  quadtree,  and  aRect  is  the  AOI  specified  by  a  user.  The  use  of  polymorphism  allows 


53 


QUERYING 


multiple  DATA  SOURCES 


fvPF> 

1  - - - - - 

U - ^ 

Spatial 

access 

4J 

Library 

- *— j 

Coverage 

tt 

-  ■  mechanism  * 
(SAM) 

\r 

-i  , - - — - - 1 

Spatial 

access 

|  8 - :==:1-1 

Lr1^ — 

1  RPF~py> 

l_r^ - 1 — i_J_ 

Subframe  | 

_  mechanism  4 

(SAM) 

L - <* 

1 _ _ _ _ _ _ 

i  I  _ _ <► 

[  TPS  p v’’* 

*  mechanism  (SAM) 
Gazeteer 

Spatial  access  mechanism  (SAM) 


- L 

|  Multi-  media  , - -  , - -|| 

Image 

j  Audio 

L“  '  Video  j 

Area 

of 

interest 


Figure  7.  Spatial  data  integration  using  spatial  access  mechanism 

GIDS  due  to  its  simple  and  intuitive  design  l  ^  uges  an  intersect  method  to 

The  method  retumSetOflntersecting  •  ^  ^  ^  intersect  All  cells  that 

determine  if  a  bounding  box  of  a  spa  candidates  for  the  requested  spatial 

intersect  the  bounding  bos  of  a  spennl  e  br»eh  of.  quadtree.  Since 

query.  A  path  of  all  the  intersecting  ceUs  aU  elements  in  the  features 

coJeit 

dimension,  shape,  and  semantics.  f  7-1, 1S  search  strategy  is  not 

A  spatial  search  always  begins  at  the  ^0[^n™R^,phk  information 
optimal  nor  efficient  considering  the  general  a  ^  ftomgan  area  ana  ‘zoom  in* 
system  (GIS)  envkonment.  Th.r  is  ^“£,1  ”,  slbrnnch  of  a  quadtree, 

and  ‘zoom  out  needing  to  acces  '  therefore  be  improved  by  leveraging  this 

access  ^rtdi^Cobb  [31]  provides  a 

AOI.  .  t  a.  hrinri  identified  datasets.  For 

The  query  model  is  not  limited  to  astac  P  Feature;  window)  in 

example,  Figure  8  shows  a  list  of  available  data  sources  QUie  A 


54 


MIYl  CHUNG  ETAL. 


Fie  t<*  fio  Qommuncato*  Udp 


Bookmark!  Jt/  LocaBorc|hHp:/4ajinf)«e  rtavy  rNl/^dbhbni  wn»*naai«d  JJJJ 


:*surtl  —  4  ":WAM 

J MiaotoW  OHica  Sho...|  ^jlnbow  ■  MiootaftOu  |  Hero**  Word- hp..  |  GeoipaaaltrtO-  |HgF<MtM«  Addition 


Figure  8.  Integrating  new  data  sources  for  query  evaluation 


different  formats  that  can  be  integrated  into  previous  query  results  displayed  on  the 
map.  A  completely  new  data  source  can  be  selected,  or  this  option  can  be  used  to  add 
new  types  of  features  from  a  previously  used  source.  The  integration  of  the  data  types 
over  AOI  through  the  quadtree  is  the  concept  that  makes  this  feasible. 

5.2.  Topological  Spatial  Query  Predicate 

Currently,  only  the  intersect  relationship  among  objects  is  implemented.  Two  object 
types  are  assumed  to  determine  the  intersection  relationship.  Two-stage  inter¬ 
section  computation  is  executed:  MBR,  then  object  geometry.  An  approximate  intersect 
relationship  is  first  imposed  on  the  MBRs  by  the  virtue  of  a  spatial  query  over  an 
AOI  using  a  quadtree.  As  a  retrieval  from  a  quadtree  is  in  process,  each  object  that  is 
within  the  AOI  must  be  tested  to  determine  if  the  object  is  one  of  the  object  classes 
specified  as  a  part  of  the  query  criteria.  Only  those  objects  that  are  of  the  object  classes 
are  retrieved. 

An  MBR  intersection  does  not  guarantee  actual  object  intersection  by  the 
pure  definition  of  MBR.  Therefore,  the  geometry  of  each  object  is  used  to  determine 
actual  object-to-object  intersection.  Each  object  group  can  be  of  different  object 
types  (i.e.  among  RPF,  TPS,  and  VPF)  or  a  result  of  any  previously  executed  query 
result. 
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5.3.  Geometric  Spatial  Query  Predicate 

Two  geometric  relationships  regarding  the  distance  metric  are  implemented:  within  and 
greater  than.  These  relationships  are  mutually  exclusive.  The  upper-bound  distance  for 
'greater  than  computation  is  the  AOI extent.  Figure  9  shows  an  example  of  the  geometric 
query,  ‘find  all  transportation  lines  within  0.5  m  from  river  lines .  over  the  Persian  Gulf 
region.  The  query,  which  is  partially  hidden  in  the  figure,  is  stated  completely  as,  ‘Q3-Q1 
within  distance  6.5  m  from  Q2’,  where  Q1  is  all  transportation  lines  and  Q2  is  all  river 
lines. 

5.4.  Attribute  Query 

Once  objects  that  meet  the  specified  query  criteria  over  the  AOI  have  been  retrieved, 
each  object’s  attribute  can  be  readily  accessed  based  on  the  data  encapsulation  property 
of  object-oriented  technology.  Two  steps  are  involved  in  evaluating  any  query  that 
specifies  attribute  constraints.  First,  all  object  types  within  the. AOI  that  are  specified  as 
query  criteria  must  be  retrieved  from  the  quadtree.  Secondly,  each  retrieved  object  must 
be  evaluated  to  determine  if  the  specified  attribute  condition  is  met  or  not. 

Remember  that  only  \TF  data  have  attributes.  A  \TF  object  at  an  abstract  class 
VPFFeature  maintains  attribute  information  as  part  of  its  state  information.  Therefore, 
we  can  inspect  the  attribute  information  for  all  previously  retrieved  YPF  features. 
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Figure  9.  A  geometrical  query  of  transportation  lines  within  0.5  m  from  river  lines  is  displayed 
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Query  processing  at  the  attribute  level  implies  querying  at  a  feature  level.  Each  feature 
has  attributes  and  values  associated  with  each  attnbute  for  that  feature.  Thus  attribute 
quen  is  an  analysis  at  a  feature  level.  A  sample  is  ‘Find  all  bridges  within  the  specified  401  that 
can  be  used  as  a  pedestrian  walkway.  A  pedestrian  walkwav  is  a  value  d-)  of  an  attribute 

fSUTT  CafrS/"y  {>"C\  An°thcr  CXamP]e  ofan  attribute  quen-  is  illustrated  in 
'1S  <jaSe’  Clryulauon  currents  are  shown  in  a  vector,  or  ‘hedgehog’  plot 
c  ranges  of  salucs  in  knots  are  differentiated  by  a  colorscale.  This  example  hso 

cance  of  the  —  —  *  ^  ^  t0 


5.5.  Nested  Query 

The  quen-  model  developed  has  the  capability  to  allow  the  construction  of  complex 
queries  from  simple  queries.  It  is  believed  that  users  ask  simple  questions,  then  use  these 
simple  questions  as  building  blocks  to  create  complex  queries.  Therefore,  it  is  important 
to  be  able  to  maintain  a  working  set  of  previous  queries.  P 

In  order  to  manage  and  maintain  each  query  and  its  results,  quen'  indexing  is 

accSb?'  T  °lfCr  W°rdS’  CnCh  qUCry  and  itS  resuIts  must  be  indexed  to  be  readily 
accessible.  To  allow  query  nesting,  the  following  minimum  informadon  regarding  each 

query  must  be  maintained:  query  criteria,  a  query  result  of  object  n-pe  1,  and  a§query 

result  of  object  npe  2.  For  relauve  quenes  such  as  geometncai  or  topological,  aUesS 

two  object  types  are  required,  e.g.  transportation  lines  (object  npe  1)  that  are  within  20 

mi  es  o  m  er  lines  (object  npe  2).  This  information  is  maintained  in  a  list.  Therefore 

each  quen-  is  implicitly  indexed  by  its  position  in  the  list.  With  this  information,  any  usei^ 
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Figure  11.  Spatial  query  processing  in  G1DS 

can  readily  formulate  a  complete  query  criteria  when  making  a  nested  query  based  on 
previous  queries.  Secondly,  the  query  results  do  not  have  to  be  recomputed,  which 
provides  overall  performance  improvement.  As  spadal  objects  in  the. -10/ are  searched, 
a  parallel  query  on  each  object  continues  rcladve  to  the  scale  and  attribute  constraints. 
Those  spatial  objects  that  meet  all  the  constraints  are  retrieved.  Once  the  objects  arc 
retrieved,  further  query  condnues  at  an  object  level.  In  other  words,  through  polymor¬ 
phism,  each  object  is  capable  of  responding  to  certain  requests.  Based  on  the  implemen¬ 
tation  of  the  object  behavior,  each  object  would  respond  appropriately  to  requests  such 
as  display.  An  implementation  of  the  spatial  query  model  is  summarized  in  Figure  1 1. 

Our  spatial  query  processing  begins  by  specifying  an  .-10/.  This  ,-10/is  specified  as  an 
MBR.  The  specified  MBR  is  sent  to  all  the  quad  trees  of  a  11  the  different  data  types.  All 
the  objects  that  meet  the  spatial  constraint  are  then  evaluated  against  the  scale  specified 
by  the  user,  followed  by  the  attribute  constraints.  Once  all  the  objects  that  meet  the 
AOI,  scale  and  attributes  are  retrieved,  further  query-  such  as  topological  or  geometrical 
queries  are  processed,  if  desired.  Any  query  beyond  the  initial  AOI,  scale  and  attribute 
query*  is  considered  as  part  of  a  nested  query. 

Thus,  from  the  spatial  data  domain  and  the  design  of  the  GIDS,  spatial  query' 
processing  is  achieved  through  a  parallel  evaluation  of  the  AOI,  scale,  and  attribute 
constraints  as  the  spatial  search  continues  on  a  sam.  Once  the  spatial  search  finishes  over 
the  sam,  relative  and  behavioral  (polymorphic)  queries  can  be  continued  on  those  spatial 
objects  that  met  AOI,  scale,  and  initial  attribute  constraints.  Nested  queries  use  the 
query  results  as  arguments  for  the  relative  comparison  and  computation.  Further 
attribute  constraints  can  be  imposed  on  those  spatial  objects  that  met  previous  query' 
requirements. 
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Figure  12.  Extensions  for  imprecise  querying 


6.  Conclusion  and  Future  Work 


An  object-oriented  approach  is  recogmzed  as  one  of  the  most  promising  technique .for 
modeling  complex  data  such  as  spatial  data.  In  querying  multiple  heterogeneous  da 
sources  Sthe  inherent  properties  of  an  object-oriented  approach,  i.e.  data  encapsulation 
and  polymorphism,  allow  for  flexibility  in  achieving  a  data-driven  approach  of  query 
processing.  A  formal  model  of  spatial  querying  requires  three  parameters:  AOIscak^d 
^attributes.  An  initial  search  is  conducted  over  the  spanal  region.  As  the  spand  search 
continues,  a  parallel  evaluation  of  the  scale  and  attributes  is  performed.  Due  to  the 
polymorphism  and  data  encapsulation  propemes  of  the  object-onented  appr^  , 
relationship  (topological  and  geometrical)  and  behavioral  queries  can  be  frirther 
evaluated. 


We  are  examining  approaches  to  extensions  of  predicates  to  allow  linguistic  expressions 
such  as 

‘ . . .  about  5  kilometers 
‘ .  slightly  overlapping ...  ’ 


The  approach  we  have  been  considering  for  the  semantics  of  such  applications  involves 
the  use  of  fuzzy  logic  [31].  Figure  12  shows  the  extensions  and  future  work  that  are 

anticipated. 
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y,  contiguity,  direction  and  area  definition  are  maintained  through  VPFs 
topology  in  the  following  manner: 
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figure  7.  Two  point  representations  of  the 
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Parity  table  for  attribute  RRA. 


A  composite  matching  score  is  then  computed  from  the  combination  of  the  expert 
system  weights  and  the  similarity  table  values.  This  score  is  given  as: 
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features  FI  and  F 2  are  created  that  each  have  the  same  number  of  nodes  such  that  each 
node  is  the  result  of  merging  the  features’  nodes  based  upon  their  ratios. 

Given  that  the  features  are  in  standard  position  and  the  nodes  have  been  merged,  we 
begin  the  last  phase  in  assessing  shape  similarity.  This  involves  determining  a  normalized 
distance  measure  between  corresponding  nodes  of  the  two  features,  given  as 


feature  matching.  The  subjective  portion  represented  by  the  expert  system  is  much  more 
difficult  to  capture.  In  response  to  this  problem,  Foley  et  al.  [11],  [I2j  present  a  web-based 
knowledge  acquisition  tool  for  capturing  expert  knowledge  needed  for  conflation. 
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The  authors  faced  the  dual  challenge  of  first 
converting  a  relational  database  to  the  00  paradigm, 
then  migrating  it  to  the  Web.  Here  they  describe  the 
procedures  and  technologies  they  used. 


fin  00  Database 
Migrates  to  the  Web 

Var-2  -.  Cocc.  University  of  Southern  Mississippi 


I  lam, 

,  j  I  n  the  Past  several  years,  object-oriented  methods  have  found  increas¬ 

ing  application  in  software  engineering.  OO's  advocates  claim  that  the 
’  paradi9m's  features — such  as  inheritance,  encapsulation,  and  poly- 

}  morphism  offer  programmers  a  flexible  and  naturalistic  means  of 

tackling  complex  software  design  problems.  More  recently,  traditional  databases 
have  been  restructured  to  take  advantage  of  00  methods.  Java's  advent  enhanced 
OOs  benefits  by  providing  a  vehicle  for  running  such  applications  across  several 
platforms  without  the  need  for  extensive  cross-platform  coding. 

Currently,  many  organizations  face  the  challenge  of  migrating  their  legacy  data 
to  smaller,  more  modern  applications  that  can  be  distributed  throughout  an  orga¬ 
nization  via  the  Web  or  a  company  intranet.  The  National  Imagery  and  Mapping 
Agency  is  one  such  organization.  NIMA  is  the  principal  and  often  sole  supplier  of 
mapping  data  for  the  US  Department  of  Defense. 

:f:£r  Software  Mav/  June  1998 
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Originally,  mapping  data  took  the  form  of  paper 
charts,  maps,  and  satellite  photos.  However,  in  the 
1980s  NIMA  began  the  tedious  process  of  trans¬ 
forming  their  paper  products  into  a  more  usable 
and  timely  digital  format  Vector  Product  Format 
(VPF),1  a  relational  data  model,  became  the  digital 
standard  for  disseminating  NIMA's  charting  and 
mapping  data.  The  Digital  Mapping,  Charting  and 
Geodesy  Analysis  Program  (DMAP)  at  the  Naval 
Research  Laboratory,  Stennis  Space  Center, 
Mississippi,  was  initiated  to  perform  reviews  and 
provide  feedback  on  NIMA's  new  digital  mapping 
prototypes. 

In  1991,  these  reviews  revealed  that  the  relational 
format  chosen  for  the  data  model  was  cumbersome 
and  less  than  optimal  in  several  ways.  The  problems 
stemmed  from  the  difficulty  in  representing  com¬ 
plex  geographic  data  by  decomposing  it  to  fit  the 
flat-file  structure  imposed  by  the  data  model.  The 
DMAP  team  proposed  an  object-oriented  data 
model  as  a  possible  alternative,  and  in  1 994  secured 
funding  to  develop  an  00  prototype  of  the  Digital 
Nautical  Chart,  one  of  the  most  complex  VPF  prod¬ 
ucts.  The  00  prototype,  Object  Digital  Nautical 
Chart,  proved  successful,  and  the  team  went  on  to 
develop  object  models  and  prototypes  for  three 
other  VPF  products,  World  Vector  Shoreline  Plus, 
Vector  Smart  Map,  and  Urban  Vector  Smart  Map.  The 
more  general  prototype,  known  as  Object  Vector 
Product  Format  could  import  any  of  NIMA's  VPF 
products  distributed  on  CD-ROM,  then  convert  the 
data  into  an  object  format  for  display,  query,  and 
updating  purposes. 

OVPF  continued  to  evolve  over  the  next  two 
years  through  funding  from  the  Office  of  Naval 
Research  and  NIMA.  What  had  been  a  purely  vector- 
based  system  expanded  to  include  00  models  for 
NIMA's  two  other  families  of  digital  products.  Raster 
Product  Format  and  Text  Product  Standard.  RPF  is 
the  basic  format  for  all  raster  products  such  as  satel¬ 
lite  imagery  and  scanned  charts,  while  TPS  provides 
an  SGML-based  standard  for  distribution  of  textual 
information  such  as  sailing  directions. 


Planning  for  the  Wee 

With  the  completion  of  the  Object  Digital  Nautical 
Chart  component  in  1995,  team  members  began 
contemplating  future  directions  for  the  project  Given 
NIMA's  role  as  data  distributor,  and  considering  the 
implications  of  their  newly  formed  Global  Geospatial 


Information  and  Services  modernization  program, 
we  knew  that  electronic  dissemination  and  remote 
updating  of  mapping  data  should  be  part  of  future 
plans  for  the  prototype.  We  considered  several 
architectures,  including  the 
Common  Object  Request  Broker 
Architecture  and  more  static, 

Web-based  methods.  A  demon¬ 
stration  in  the  summer  of  1995  ^ava-based 
showed  a  simple  remote  updat¬ 
ing  capability  based  on  a  Web 
browser,  Perl  scripts,  GIF  images, 
and  a  remote  human  operator.2  Although  this  was  a 
start,  we  knew  that  we  wanted  a  much  more  so¬ 
phisticated  level  of  distributed  operation,  and  the 
team  began  searching  forthe  right  direction  to  take 
to  provide  network-based  functionality. 

At  that  time,  the  Object  Management  Group's 
Corba  seemed  the  most  promising  architecture  for 
object-oriented  and  standard  systems  interoper¬ 
ability.  Java  also  made  its  debut  in  1995,  awing 
developers  with  powerful  new  Web-based  capabil¬ 
ities.  While  the  team  watched  these  new  develop¬ 
ments  closely,  investments  in  these  two  areas  were 
not  viable  at  the  time,  given  that  Corba  had  not  yet' 
been  accepted  as  a  standard  and  it  was  not  readily 
apparent  that  Java's  ultimate  popularity  would 
match  its  promise.  Therefore,  we  decided  not  to  pur¬ 
sue  the  issue  at  that  time,  but  to  carefully  monitor 
these  evolving  technologies  while  proceeding  with 
application  expansion  in  other  areas. 

Opportunity  Knocks 

During  this  period,  the  team  became  involved  in 
another  project  funded  by  NIMA's  Defense 
Modeling  and  Simulation  Office  and  Terrain 
Modeling  Program  Office.  The  first  phase  of  this  pro¬ 
ject  involved  the  development  of  a  new  VPF  prod¬ 
uct  standard  and  prototype,  tailored  for  use  by 
members  of  the  modeling  and  simulation  commu¬ 
nity.  This  effort  resulted  in  the  Extended  Vector 
Product  Format  (EVPF),3  which,  among  other  things, 
allowed  the  depiction  of  triangulated  irregular  net¬ 
works  to  represent  terrain  data  in  three  dimensions.4 
Using  OVPF,  project  staff  developed  an  00  model 
for  EVPF  and  incorporated  a  prototype  data  set, 
Modeling  and  Simulation  Extended  Vector  Product 
into  the  system. 

Concurrently  with  this  work,  members  of  the 
DoD  modeling  and  simulation  community  devel- 


paPP'ng  d'ent. 


jOur  project's  goal 
iwas  to  create  a 
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Relational  data  import/OODB 
Quadtree  search  for  features 

/Map  display 
^  z  Attribute  query 
/  t  .  Feature  selection 

/  Java  client  / 

Vnap  application 


CORBA 


Area  of  interest 


Listings  for  databases, 
libraries,  coverages 


/Smalltalk  server 
Vmap  application 


~  SmaiitalkBroker! 


User’s  selection 


Set  of  features  as  copies 


r .  c  -j  .=?  x  : .  Basic  system  design  for  the  Web’based  OO  mapping 
database. 


sr,  quttE  2. The  mapping  database's  world  map.  Users  can  select 


!GU 

an  area  of  interest  from  this  map  by  clicking  on  a  location  or  typing 
coordinates  that  form  a  bounding  rectangle. 


oped  a  conceptual  model  for  exchanging  synthetic 
environment  information.  When  completed,  the 
Synthetic  Environment  Data  Representation  and 
Interchange  Standard  model  (SEDRIS)  will  let  het- 
erogeneous  modsling  and  simulation  systems  ex¬ 
change  information  using  APIs. 

In  early  1 997,  the  NIMA  EVPF  program  manager 
requested  that  the  NRL  team  research  available  00 
technology  for  the  transfer  of  modeling  and  simu¬ 
lation  data  across  a  network  as  a  SEDRIS  support 
subsystem.  At  that  time,  Corba  2.0  emerged  as  a 
fairly  stable  standard  with  promised  vendor  sup¬ 
port  Meanwhile,  Java  exhibited  a  meteoric  nse  as 
the  00  programming  language  of  choice.  Given 
these  trends,  we  felt  secure  in  using  these  support¬ 
ing  technologies  for  a  full-scale  migration  of  our  ap¬ 


plication  to  the  Web.  The  "Supporting  Technologie, 
boxed  text  on  pp.  28-29  provides  more  informatic 
on  the  Corba  standard  and  the  Smalltalk  and  Ja\ 
programming  languages. 


System  Design 

The  basic  architecture  of  the  system  is  shown  ; 
Figure  1 .  Our  project's  goal  was  to  create  a  Java-base 
mapping  dientthat  would  provide  display  and  quer 
capabilities  for  a  set  of  geographic  objects  (feature, 
that  would  be  retrieved  from  the  Smalltalk  mappir, 
prototype  acting  as  a  server.  We  planned  to  bas 
communication  on  the  Corba  specification.  We  si; 
planned  to  keep  the  remote  fetching  of  objects  con 
pletely  transparent  to  the  end  user,  who  wouic  : 
able  to  manipulate  the  features  as  if  they  were  loc: 

The  retrieval  setup  we  decided  upon  provides 
world  map,  shown  in  Figure  2,  from  which  the  us 
could  select  a  particular  area  of  interest.  The  us 
could  choose  an  AOI  by  either  selecting  a  location 
a  world  map,  or  by  manually  entering  coordinat 
to  form  a  bounding  rectangle. 

From  this  input,  the  application  selects  and  trar 
mits  a  bounding  box  of  the  geographical  region  s 
lected,  and  the  Smalltalk  server  responds  with  a  s 
of  OVPF  database  names,  as  Corba  string  objec 
which  contain  data  for  that  region.  Once  the  u< 
selects  a  particular  database  from  the  returned  s 
the  system  likewise  returns  the  set  of  pertaining 
brary  names.  Upon  selecting  a  particular  library, ; 
system  retrieves  a  list  of  coverage  names.  Fina 
after  the  user  chooses  a  coverage,  the  system  retu 
the  set  of  feature  class  listings.  All  this  interact 
invokes  the  appropriate  server  methods  for  del 
mining  geographically  relevant  responses.  Fina 
the  client  retrieves  from  the  server  all  the  featu 
specified  by  the  user.  As  opposed  to  previous  cc 
munications  between  the  client  and  the  server, 
features  returned  at  this  stage  are  complex  obje 
These  feature  objects  contain  both  geometric  l 
ordinate)  information  and  attribute  informati 
Thus,  the  features  can  be  displayed  both  graphic 
and  textuaily  and  can  be  manipulated  in  ways  s 
ilar  to  those  of  server-based  operations. 

Figure  3  shows  the  Java  client  map  GUI,  al< 
with  a  set  of  hazard  area  features  that  were  retrie 
from  the  server  application.  Along  the  left  sid 
the  map  is  the  set  of  list  panes  from  which  the  i 
selects — and  the  server  alternately  supplies- 
listings  for  database,  library,  and  coverage  se 
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tions.  As  Figure  3  shows,  users 
may  display,  zoom  in,  and  zoom 
out  the  feature  objects. 

Additionally,  users  can  request 
a  query  window,  which  appears  as 
a  pop-up  window  beside  the  map. 

The  query  window  displays  the 
list  of  feature  IDs,  from  which  the 
user  may  select  for  display  associ¬ 
ated  attribute  information.  The 
Select  button  allows  users  to  high¬ 
light  the  feature  on  the  map  that 
corresponds  to  the  selected  fea¬ 
ture  ID  in  the  query  window. 

We  loosely  based  the  client 
map  design  on  the  map  design 
used  in  the  server  application.  '  “igups  2.  The  graphical  user  interface  for  the  mapping  database's  Java  client.  The 
The  principle  difference  at  this  main  pane  displays  hazard  areas  retrieved  via  query;  buttons  below  the  panel  allow  for 
time  is  that  no  editing  func-  zooming  and  selecting  objects  and  for  opening  a  query  window  that  lists  feature  IDs. 
tions  are  available  with  the 
client  map — it  is  strictly  for  vi¬ 
sualization  and  feature  querying.  Because,  ultimately,  the  server's  job  wasto  return  a 

set  of  requested  features  to  the  client,  we  consid¬ 
ered  implementation  of  a  single  server  class  ade- 
i.^PLE;^E:N~T;CN  quate  for  the  application.  Two  classes  emerged  as 

strong  candidates.  VPFCoverage,  a  coverage-level 

We  originally  planned  to  build  on  the  version  of  metaclass,  imports  the  features  from  the  original 

the  Smalltalk  application  that  had  been  integrated  VPF  relational  files  and  transforms  them  to  a  mem* 

with  Gemstone's  commercial,  object-oriented  data-  ory-resident  00  format.  VPFSpatialDataManager, 

base  management  system.  GSI  was  scheduled  tore-  a  manager  class,  performs  such  functions  as  in¬ 
lease  an  ORB-integrated  product  in  the  second  quar-  setting,  removing,  and  locating  features  in  the 

ter  of  1 997.  By  the  third  quarter*of  '97,  however,  the  quadtree  spatial  indexing  structure.  Because  the  re- 

product  remained  unavailable,  so  we  decided  to  lational  import  step  that  VPFCoverage  handles 

move  ahead  without  it  We  chose  a  two-phase  mi-  would  not  be  a  consideration  in  phase  two  of  the 

gration,  moving  first  from  the  memory-based  ap-  project  (involving  the  ODBMS,  which  operates  only 

plication  to  a  Corba  server  application,  followed  by  on  persistent  features  already  stored  in  object  for- 

a  second  step  that  integrated  the  changes  with  the  mat),  we  selected  VPFSpatialDataManager  to  be 

ODBMS  ORB  when  it  became  available.  We  began  the  client's  server-interface  object, 

our  work  with  SmalltalkBroker  by  DNS  Technologies,  We  made  few  changes  to  the  class's  Smalltalk  im- 

primariiy  because  GemStone  planned  to  integrate  plementation  because  it  already  contained  much  of 

this  ORB  into  their  ODBMS.  the  functionality  for  performing  the  desired  tasks. 

Our  transformation  of  the  original  Smalltalk  ap-  such  as  the  ability  to  retrieve  features  from  the 

plication  into  a  server-based  application  focused  on  quadtree  contained  within  a  specified  bounding  box. 

three  areas:  Our  only  real  additions  to  class  functionality  included 

♦  determining  which  objects  should  be  server  methods  to  determine  and  transfer  appropriate 

objects  and  thus  have  interfaces  defined,  database,  library,  and  coverage  names  based  on  the 

♦  inserting  code  to  provide  the  Corba  services  client's  transmitted  area  of  interest  (in  the  form  of  a 

needed,  and  VPFBoundingBox),  and  the  issuance  of  messages  to 

♦  generating  the  !DL  from  the  existing  Smalltalk  VPFCoverage  classes  to  import  requested  data  that 

classes.  were  not  located  in  the  quadtree.  We  also  added 

First,  we  reviewed  the  set  of  classes  that  had  di-  Corba-specific  code  to  perform  functions  such 

rect  responsibilities  for  feature-level  management  as  registering  the  VPFSpatialDataManager  object 


mterrace  vFripauaiDataManager :  Smaiitaik::Objecti 

readonly  attribute  any  maxLevei; 

readonly  attribute  any  storedContainer; 


Retrievai:rCorbaVPFFeatColi::byVaiue  Return  FeatunesForAOl  (in 
RetrievabVPFBoundingBoxnbyValue  aBB); 

Retrievai::CorbaStringColl::byValue  RetumDatabasesForAOIfin 
Retrieval::VPFBoundingBox::by Value  aBB); 

RetrievalnCorbaStringCglLbyValue  ReturnLibrariesForAOI(in 
Retrieval:: VPFBoundlngBox::byValue  aBB,  in  string  dbname); 

Retrieval::CorbaStringCoJI::byValue  ReturnCoveragesForAOIfin  string 
dbname,  in  string  libname); 

Retrieval::CorbaStringColl::byValue  ReturnFeatListForCov(in  string 
dbname,  in  string  libname,  in  string  covname); 

Retrieval::CorbaVPFFeatColl::byValue  RetumFeatures(in 
Retrievai::CorbaStringCollnbyValue  featColl,  in 
Retrieval::VPFBoundingBox::byValue  aBB,  in  string  dbname,  in  string  libname,  in 
string  covname); 

long  ReturnFeaturelDO; 
any  CorbaNameQ; 

]; 


interface  Caroa  VPFreature  :  Smalitaik::Cbject{ 


attribute 

attribute 

attribute 

attributes; 

attribute 

attribute 

attribute 

attribute 

boundingBox; 


string  dbname; 

long  id; 

Retrieval::AttributeCollection 

Retrievai::Coordinates  coords; 
string  covname; 

string  libname; 

RetrievabVPFSoundingBoxnbyVaiue 


struct  byVa!ue{ 

string  dbname; 

Retrieval::AttributeCoilection 
RetrievahCoordinates  coords; 

long  id; 

string  covname; 

string  libname; 

Retrieval::VPFBoundingBox::byVaiue 

boundingBox; 

}; 

}; 


interface  Coroa  VPFreatCoil :  SmalItalk::Object{ 

attribute  Retrieval::Featu  recollection 


}; 


struct  by  Value{ 

RetrievabFeatureCollection 

}; 


attributes; 


featColl; 


featColl; 


Figure  4.  The  interface  definition  language  and  some  significant  para¬ 
meter  types  for  the  VPfSpatialDataManager  class.  The  IDL’s  syntax  strongly 
resembles  C++. 


with  the  SmalltalkBroker  Naming  Service. 
Overall,  these  types  of  changes  took  very 
few  lines  of  code  to  implement 

Figure  4  shows  the  IDL  for  the  VPF¬ 
SpatialDataManager  class,  as  well  as  some 
of  the  significant  parameter  types. 

As  Figure  4  shows,  IDl's  syntax  strongly 
resembles  that  of  C++,  in  which  objects 
and  their  corresponding  attributes  and 
methods  can  be  declared  As  shown,  the 
method  RetumFeaturesForAO!  returns  a 
collection  of  features.  The  byValue  struc¬ 
tures  let  us  pass  back  complete  copies  of 
the  features  in  the  collection.  Otherwise, 
the  client  would  receive  Corba  object  ref- 
erencesto  the  objects  and  would  have  to 
manipulate  them  remotely  through  de¬ 
fined  interfaces.  We  found  this  undesir¬ 
able  because  of  the  client's  nature  as  a  riis- 
play-and-query  application.  This  would 
result  in  frequent  system  calls,  which  indi¬ 
cated  that  local  copies  of  the  data  wouic 
be  more  efficient  than  remote  messaae 
sends  for  acquiring  pieces  of  feature  data 
as  needed 

Java  Client 

Essentially,  the  Corba  client  performs 
some  other  component's  requests  in  the 
distributed  application.  Corba  objects 
serve  as  the  components  that  provide 
functionality.  These  objects,  in  turn,  are 
specified  using  IDL  The  system  interface's 
specification  is  separate  from  the  imple¬ 
mentation,  which  lets  developers  make 
changes  to  the  implementation  without 
visibly  altering  the  clients.  The  Corba 
server  implements  the  distributed  objects 
in  the  Smalltalk  server,  then  the  Java  client 
uses  them. 

We  developed  the  Corba  client  in  Java 
as  a  standalone  application  using  Sun's 
Java  Development  Kit.  We  used  JavaiDL 
another  Sun  product,  for  the  ORB  software. 

As  the  project  evolved,  however,  we  found 
thatJavalDl's  lack  of  documentation  hin¬ 
dered  development  Thus,  approximately 
one  month  into  the  development  we 
dropped  it  in  favor  of  Visigenic  Software's 
VisiBroker,  which  immediately  improved 
client-side  development  time. 

Because  the  IDL  had  already  been  gen- 
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erated  for  the  Smalltalk  server,  we  simply  used  the 
same  IDL  for  the  client,  which  we  then  compiled 
using  the  IDL  compiler.  The  compile  operation  au¬ 
tomatically  generates  the  stub  and  skeleton  Java 
code  necessary  to  convert  object  references  into  net¬ 
work  connections  to  a  remote  server.  These  refer¬ 
ences  then  marshal  arguments  of  an  operation  to 
the  corresponding  method  invocation.  Project  staff 
then  wrote  corresponding  Java  code  to  fill  in  the 
skeletons.  Upon  execution,  the  method  returns  the 
results  to  the  client  application.  The  following  code 
segment  demonstrates  how  an  object  reference  is 
initiated  and  shows  a  subsequent  method  request 
to  return  CorbaVPFFeatures. 

org.omg.Corba. Object  obj  = 
orb. string_to_ob ject  (ior)  ; 

Retrieval. VPFSpatidlDataManager 
vpfRef  =  Retrieval. VPFSpatial 
DataManagerHelper.  narrow  (obj  )  ; 

Retrieval. CorbaVPFeature  cvpfeat  = 
vpfRef . ReturnCorbaVPFFeature 
(AreaOf Interest  aoi); 

System. out. println ("Feature  name  is  *, 
cvpfeat . dbname ) 

In  the  first  line,  the  ior  parameter  is  the  inter¬ 
operable  object  reference  for  the  server  object  lORs 
are  specified  by  Corba  as  a  way  to  provide  vendor 
and  machine-independent  specifications  of  server 
objects.  The  IOR  contains,  in  string  format,  all  the  in¬ 
formation  about  an  object  that  an  ORB  needs  to  lo¬ 
cate  it.  The  IOR  is  written  to  a  file  by  the  Smalltalk 
application  upon  creation  of  the  server  object  The 
Java  program  then  reads  this  file  and  subsequently 
uses  the  IOR  to  initiate  a  server  connection. 

The  third  line  of  code  demonstrates  the  actual  re¬ 
quest  made  to  the  server  for  the  geographical  fea¬ 
tures:  the  AOI  is  passed  as  a  parameter,  and  the  re¬ 
turned  set  of  features  is  stored  in  cvpfeat.  The  user 
can  then  display  and  query  these  features  locally. 

Web-integrated  applet 


communicate  with  object  servers  across  networks 
without  violating  the  security  restrictions  imposed 
by  firewalls  and  Web  browsers.  The  GateKeeper  acts 
as  a  gateway  from  an  applet  to  server  objects,  even 
if  a  firewall  restricts  access.  Initiating  the  GateKeeper 
simply  requires  that  the  utility,  gatekeeper,  be  initi¬ 
ated  on  the  server.  Doing  so  triggers  creation  of 
GateKeeper's  IOR  file,  which  contains  important  in¬ 
formation  that  enables  the  ORB  runtime  inside  the 
applet  to  connect  with  the  GateKeeper. 

The  Smart  Agent  is  a  dynamic,  distributed  direc¬ 
tory  service  that  provides  facilities  for  server  objects 
and  client  applications.  All  object  implementations 
are  registered  with  the  Smart  Agent  so  that  when  a 
client  attempts  to  reference  an  object,  the  Smart 
Agent  locates  the  correct  object  implementation. 
Once  an  object  is  destroyed,  the  Smart  Agent  re¬ 
moves  it  from  the  list  of  available  objects.  Initiating 
the  Smart  Agent  requires  that  the  osagent  utility  be 
invoked  on  the  server. 

The  resulting  applet  executes  in  the  Netscape  4.0 
Java-enabled  browser  exactly  as  does  the  stand¬ 
alone  application,  maintaining  its  original  behavior 
and  interface.  The  Netscape  user  is  presented  with 
the  same  GUI  and  makes  requests  in  the  same  man¬ 
ner  as  described  previously.  The  only  requirement 
is  that  the  server  ORB  continue  running  on  the  host 
to  service  incoming  data  requests. 

Testing 

At  first,  server  and  client  development  proceeded 
independently,  with  simple  Smalltalk  client  and  Java 
server  code  used  to  incrementally  test  the  applica¬ 
tions.  This  was  chosen  as  the  original  development 
strategy  because,  for  the  most  part,  team  members 
were  conversant  in  either  Java  or  Smalltalk,  but  not 
both.  When  it  came  time  to  test  the  Java-Smalltalk 
connection,  however,  we  needed  dose  cooperation 
and  constant  communication  to  fine-tune  the  client- 
server  interactions. 


Once  the  standalone  application  worked  properly, 
we  began  transforming  the  application  to  a  bnowser- 

capableappletHavingtheclientrunasaJavaapplet 

rather  than  a  Java  application  required  additional  con¬ 
figurations  because  Web  browsers  impose  security 
restrictions  on  Java  applets'file  input  and  output  op¬ 
erations.  To  circumvent  this  obstacle,  VisiBroker  pro¬ 
vides  two  utilities:  GateKeeper  and  Smart  Agent 
Essentially,  the  GateKeeper  enables  an  applet  to 


he  successful  distributed  application  we  de¬ 
veloped  proved  the  effectiveness  of  our  ap¬ 
proach  and  provided  a  foundation  for  a  new  era  of 
research  in  00  digital  mapping.  Remaining  work 
includes  the  implementation  of  the  0DBM5/0RB- 
coupled  server  application;  Another  extension  in¬ 
volves  the  ability  to  pass  raster  objects  in  addition  to 
the  vector  objects  already  supported. 

Once  these  tasks  are  accomplished,  modeling 
and  simulation  users  and  others  interested  in 
mapping  data  will  have  access  to  map  objects 
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In  our  development  of  the  NIMA  EVPF  mapping  database, 
we  chose  several  complementary  technologies  to  convert  the 
!  application  to  an  00  paradigm  and  migrate  it  to  the  Web.  Chief 
;  among  these  were  Corba,  Smalltalk,  and  Java. 


Corba 

|  The  Object  Management  Group  developed  Corba  as  a  stan- 
!  dard  for  distributed  objects.  The  OMG  itself  does  not  develop  or 

sell  software  to  support  the  standard,  but  rather  serves  as  a  con¬ 
sortium  of  software  vendors  and  end  users  interested  in  defin¬ 
ing  object  standards  for  the  industry.  Commercial  companies, 
who  may  or  may  not  be  members  of  the  consortium,  then  de¬ 
velop  and  market  products  that  support  these  standards, 
j  As  with  any  standard,  the  development  of  Corba  has  been  a 
i  lengthy  and  arduous  task.- For  much  of  the  first  half  of  the  1990s 

debate  raged  over  Corba's  status  as  a  standard  Although  the 
publications  of  the  Corba  1.0  and  1.1  specifications  seemed 
promising,  with  some  vendor  support  being  provided  in  com- 
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Figure  A.  Basic  Corba  setup.  The  Interface  Definition 
Language  (IDL)  and  interface  repository  facilitate  com¬ 
munication  between  different  programming  languages, 
while  the  Internet  Inter-ORB  Protocol  (HOP)  enables 
Corba’s  platform  and  implementation  independence. 


mercial  products,  there  was  still  littfe  or  no  interoperability 
among  those  products.  Thus,  most  practitioners  felt  that  Corba 
;  remained  a  work  in  progress. 

*  In  December  1 994,  OMG  introduced  the  Corba  2.0  specifica¬ 
tion  with  its  much-anticipated  extensions  to  the  earlier  version. 
This  version  seemed  to  better  satisfy  vendors  as  to  its  stability 


and  marketability;  thus  commercial  products  that  supported  the 
standard,  and  therefore  offered  interoperability  between  vendor- 
specific  implementations,  became  available.  Another  leap  for¬ 
ward  came  with  the  advent  of  Netscape  Communicators  inclu¬ 
sion  of  Visiaenic  Software's  VisiBroker,  a  Java  object  request  broker. 

Corba  specifies  a  compiete  middleware  architecture  for  dis¬ 
tributed  object  communication.  It  is  one  part  of  the  Object 
Management  Architecture  published  by  the  OMG  in  1990. 
Specifically,  Corba  represents  the  communications  element  of 
the  OMA.  responsible  for  interoperable  handling  of  message  dis¬ 
tribution  between  application-level  objects.  The  ORB,  Corba  s  key 
component,  is  a  software  bus  that  acts  as  a  broker  of  all  requests 
and  is  responsible  for  intercepting  requests,  locating  the  appro¬ 
priate  objecr  for  handling  the  request,  and  invoking  the  correct 
method  on  that  object  The  ORB  must  also  marshal  and  unmar¬ 
shal  any  parameters  by  converting  them  to  a  common  data  type 
and  then  back  to  a  programming  language-specific  type,  and  re¬ 
turn  any  results  from  the  method  invocation.  Any  object  may  be 
either  a  requestor  of  a  service,  a  provider  of  a  service,  or  both. 
Further,  any  two  ORBs  that  follow  the  Corba  2.0  specification  can 
provide  communication  between  their  respective  application 
objects,  regardless  of  vendor-specific  implementation  details. 

The  ORB  communicates  requests  between  different  pro¬ 
gramming  languages  via  the  Interface  Definition  Language.  IDL 
is  a  declarative  language  used  to  specify  object  interfaces  and 
definitions.  With  IDL  you  can  specify  interfaces  with  inheritance 
properties,  operations  to  which  an  object  can  respond  attrib¬ 
utes,  ana  types,  among  other  things.  Related  IDL  definitions  are 
grouped  into  modules.  IDLsyntax  itself  is  very  similar  to  Cr*+,  but 
components  denned  in  IDL  can  be  implemented  in  any  language 
that  has  Corba  bindings,  such  as  Ada.  C Smalltalk,  and  Java.  IDL 
provides  the  common  definitions  through  which  objects  denned 
in  different  programming  languages  can  communicate. 

Another  major  Corba  component,  the  interface  repository, 
acts  as  a  metadata  repository  for  the  ORB.  The  repository  regis¬ 
ters  server  objects' IDL  interfaces  for  public  access.  Users  can  re¬ 
trieve  information  about  interfaces,  methods,  and  parameters 
from  the  repository  dynamically. 

Finally,  one  of  Corba  2.0's  most  significant  contributions,  the 
Internet  Inter-ORB  Protocol,  enables  Corba's  platform  and  imple¬ 
mentation  independence.  An  HOP  message  is  essentially  a  byte 
stream  that  adheres  to  a  network  protocol.  HOP  is  used  both  in 
Netscape  Navigator  (with  LiveConnect)  and  as  the  basis  for  remote 
procedure  call  implementation  in  Java.1  An  ORB  must  provide  fa¬ 
cilities  for  marshalling  (converting  messages  into  HOP  for  trans¬ 
mission)  and  unmarshalling  (converting  HOP  packets  into  a  local 
format).  Figure  A  shows  the  basic  Corba  communication  setup. 
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Jon  Siegel-  provides  a  good  source  for  those  interested  in 
learning  more  about  Corba,  as  does  the  specification  itself.3 

j  Smalltalk 

!  This  completely  object-oriented  language,  introduced  in  the 
:  early  1980s,  uses  a  graphical,  interactive  programming  environ¬ 
ment.  Smalltalk  is  based  on  the  concept  of  communicating  enti¬ 
ties,  known  as  objects.  Objects  are  normally  regarded  as  instances 
of  classes  that  have  attributes  and  that  implement  methods  for 
performing  operations.  Communication  between  objects  occurs 
when  one  object  sends  a  message  to  another,  causing  that  ob¬ 
ject  s  method  of  the  same  name  to  be  invoked  The  ability  to  de¬ 
fine  objects  that  interact  in  this  way  enables  the  incremental  de¬ 
velopment  of  .very  complex  systems.  Starting  with  just  a  few 
objects,  programmers  can  implement  basic  capabilities  quickly, 
then  add  to  or  refine  these  objects  until  the  system  is  complete. 
Smalltalk's  interactive  environment  allows  chanaes  to  be  made, 
and  their  effects  known,  in  a  very  short  period  Adele  Goldberg 
and  David  Robson4  provide  an  excellent  reference  for  those  wish¬ 
ing  to  learn  more  about  Smalltalk,  and  David  Smith5  gives  a  quick 
introduction  to  object-oriented  concepts  in  general. 

!  Since  the  beginning  of  NRL's  object-oriented  mapping  work 
in  1 994,  we  have  used  ParcPiace-Digitalk's  VisualWorks  Smalltalk 
environment,  in  conjunction  with  OTI's  ENW/Deveioper  source 
code  manager,  to  let  multiple  developers  make  changes  to  the 
source  code.  Our  development  platform  consists  of  several  Sun 
Sparc  workstations  running  the  Solaris  operating  system,  with 
one  workstation  per  developer.  This  has  been  an  extremely  ef¬ 
fective  setup,  and  continues  to  be  the  development  environ- 
I  ment  of  choice  for  the  distributed  mapping  project  Kevin  Shaw 
and  colleagues  provide  more  details  on  the  development  his- 
;  tory  of  the  prototype  and  the  impact  of  using  an  00  approach 
j  for  the  project.*5 

Java 

Javas  object-oriented  programs,  embeddable  in  Internet 
Web  pages,  are  commonly  referred  to  as  applets.  For  applets  to 
run,  a  reference  to  the  actual  Java  program  name  must  be  spec¬ 


ified  in  a  Hypertext  Mark-up  Language  document.  This  allows  j 
applets  to  be  executed  either  within  a  Java-enabled  Web  browser 
such  as  Netscape  or  Internet  Explorer,  or  by  using  the  applet 
viewer  provided  in  the  Java  Development  Kit  j 

You  can  also  use  Java  to  create  standalone  applications  as  I 
you  would  in  C  or  C-H*.  Although  applications  differ  from  applets  ! 
in  that  they  cannot  be  executed  in  a  Web  browser,  converting  \ 
between  the  two  is  generally  straightforward  The  major  issue  in  j 
developing  applets  is  that  such  programs  cannot  read  or  write  to  j 
files  because  of  the  security  restrictions  built  into  Java.  You  must  j 
use  Corba-iike  software  designs  to  overcome  this  obstacle. 

Java's  syntax  is  similar  to  that  of  C-hk  Thus,  developers  mi¬ 
grating  to  Java  from  C-r-  will  typically  encounter  few  difficulties. 
Programmers  can  grasp  Java's  fundamental  concepts  quickly, 
which  helps  them  be  productive  from  the  outset.  This  proved  the 
case  for  us  as  well.  To  learn  more  about  Java,  consult  Ivor  Hortons 
excellent  introductory  text.7  and  James  Gosling  and  Frank  Yellins 
reliable  reference.3  We  also  found  Andreas  Vogel  and  Keith  1 
Duddy’s  Corba-specific  reference9  invaluable  for  this  project. 
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from  any  platform  that  has  a  network-connected 
Web  browser.  This  would  let  less  capable  client- 
side  machines  simulate  the  functionality  of  more 
powerful  server  machines.  Future  plans  also  in¬ 
clude  the  ability  of  client  users  to  modify  data  and 
pass  the  modifications  back  to  the  server  for  sub¬ 
sequent  distribution.  For  example,  in  a  modeling 
and  simulation  scenario,  one  user  could  perform 
an  action  that  affects  either  the  geometry  or  at¬ 


tributes  of  a  feature — for  example,  destroying  part 
of  a  bridge — then  have  that  information  relayed 
back  to  the  server  so  that  collaborative  models 
could  be  informed  of  the  change. 

After  reviewing  the  project's  first-phase  imple¬ 
mentation,  we  believe  that  the  implications  of  mi¬ 
grating  the  original  application  to  a  server  appli¬ 
cation  are  tremendous.  Web-based  mapping 
repositories  are  clearly  the  future  for  both  military 
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and  civilian  mapping  needs,  and  Corba  2.0  appears 
to  be  a  step  in  the  right  direction  for  providing 
these  capabilities.  •> 
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Abstract.  Many  facets  of  spatial  data  representation  inherently  involve  issues  of 
accuracy  and  uncertainty.  This  problem  is  greatly  magnified  when  considering  the 
integration  of  spatial  data  from  different  sources,  such  as  in  a  distributed  or 
interoperable  environment.  The  general  concept  of  schema  merging  involves  the 
resolution  of  incompatibilities  as  in  a  distributed  environment.  These  mav  be 
either  structural  or  semantic  in  nature.  Structural  incompatibilities  involve  those, 
for  example,  in  which  attributes  for  representing  the  same  values  are  defined 
differently.  Semantic  incompatibilities,  however,  represent  those  cases  in  which 
similarly  defined  attributes  have  different  meanings  or  values.  For  example,  an 
attribute  of  WIDTH  for  a  road  in  one  database  may  include  the  width  of 
associated  access  lanes,  while  in  another  database  it  may  be  only  the  main 
driveable  portion  of  the  road.  Such  semantic  issues  are  much  more  difficult  to 
resolve,  as  they  require  a  deeper  understanding  of  the  data.  We  will  survey  the 
issues  as  discussed  above  for  spatial  data  in  such  environments  and  describe 
several  approaches  for  different  aspects  of  the  data  using  fuzzy  set  techniques  to 
deal  with  the  incompatibilities. 

Keywords,  fuzzy  GIS,  conflation,  uncertainty,  spatial  databases 


1  Introduction  to  Spatial  Data  Uncertainty 

The  database  field  is  currently  extending  its  paradigm  from  a  mostly  relational 
approach  towards  an  object-oriented  direction.  It  is  not  clear  yet  what  forms  these 
object-oriented  extensions  will  take.  Indeed,  there  is  little  agreement  upon  which 
features  are  needed,  since  this  approach  did  not  spring  from  a  single  source  as  the 
relational  model  did  from  Codd's  research  [1],  Likewise,  the  scope  of  database 
capabilities  is  expanding  to  encompass  multimedia,  including  text,  pictures  and 
sound,  particularly  in  a  distributed  environment.  This  is  of  significance  in 
supporting  complex  applications  such  as  CAD/CAM  design  systems  and 
geographical  information  systems  (GIS).  Such  a  range  of  database  approaches 
represents  part  of  a  broad  spectrum  of  information  systems  encompassing  even  the 
processing  of  large  text-type  files  conventionally  construed  as  the  realm  of 
information  storage  and  retrieval  systems. 
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Thw  importance  of  representations  of  uncertainty  in  databases  is  increasing  as 
more  complex  applications  such  as  CAD/CAM  and  GIS  are  being  undertaken  in 
object-oriented  and  multi-media  databases.  In  this  paper  we  consider  a  number  of 
issues  related  to  uncertainty  in  spatial  data  representations  and  GIS,  and  the 
impact  on  a  distributed  system  in  dealing  with  them.  The  relational  model  has 
been  the  dominant  database  model  for  a  considerable  period,  and  so  it  was 
naturally  used  by  researchers  to  introduce  fuzzy  set  theory  into  databases.  Much 
of  the  work  in  the  area  has  been  in  extending  the  basic  model  and  query  lanzuases 
to  permit  the  representation  and  retrieval  of  imprecise  data.  A  number  of  related 
issues  such  as  functional  dependencies,  security,  implementation  considerations 
and  others  have  also  been  investigated. 

Two  major  approaches  have  been  proposed  for  the  introduction  of  fuzziness  in 
the  relational  model.  The  first  one  uses  the  principle  of  replacing  the  ordinary 
equivalence  among  domain  values  by  measures  of  nearness  such  as  similarity 
relationships  [2],  proximity  relationships  [?].  and  distinguishabiiitv  functions  [4], 
The  second  major  effort  has  involved  a  variety  of  approaches  that  directly  use 
possibility  distributions  for  attribute  values  [5.  6.  7],  There  nave  also  been  "some 
mixed  models  combining  these  approaches  [S.  9J.  Tnese  approaches  have  also 
been  extended  to  object-oriented  databases  [10].  We  can  see  "that  variations  of  all 
of  these  approaches  are  of  use  in  modeling  spatial  and  geographic  data,  and  some 
will  be  described  shortly. 


The  need  to  handle  imprecise  and  uncertain  information  concerning  spatial  data 
has  been  widely  recognized  in  recent  years  (e.g..  [1 1]j.  particularly  in  the  field  of 
GIS.  GIS  is  a  rather  general  term  for  a  number  of  approaches  to  the  management 
Oj  „^iOgiaphL  and  spatial  iniormation.  Most  deiiniiions  of  a  ceoarapnic 
information  system  [1_.  13]  descr.be  it  as  an  organized  collection  of  software 
systems  and  geographic  data  abie  to  represent,  store  and  provide  access  for  all 
forms  of  geographically  referenced  information.  A:  the  hean  of  a  G!S  is  a  spatial 
database.  The  spatial  information  describes  the  location  and  snaoe  of  seourapnic 
features  in  terms  of  points.  lines  and  areas. 

Tnere  has  been  a  strong  demand  to  provide  approaches  that  deal  with 
inaccuracy  and  uncertainty  in  GIS.  Tne  issue  of  spatial  database  accuracv  has  been 
viewec  as  arnica!  tc  the  successful  implementation  and  long-term  viability  of  GIS 
technology  fl  1).  Tne  vaiue  of  a  GIS  as  a  decision-making  tool  is  dependent  on  the 
ability  or  decision-makers  to  evaluate  the  reliability  of  the  information  on  which 
their  decisions  are  based.  Users  of  geographic  information  system  technology 
must  tnererore  be  abie  to  assess  the  nature  and  degree  of  error  in  spatial  databases, 
track  this  error  through  GIS  operations,  and  estimate  accuracy  for  both  tabular  and 
graphic  output  products.  There  is  a  variety  of  aspects  of  potential  errors  in  GIS 
encompassed  by  the  general  term  "accuracy."  However,  here  we  focus  on  those 
aspects  that  lend  themseives  to  modeling  by  fuzzy  set  techniques. 


1.1  Significance  of  Uncertainty  Modeling  for  Spatial  Data  Systems 

There  have  been  a  number  of  recent  indications  of  the  importance  of  uncertainty 
modeling  in  spatial  data.  Two  in  particular  are  of  most  significance.  First,  the 
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National  Imagery  and  Mapping  Agency  of  the  United  States  (NIMA),  announced 
for  fiscal  year  1997  a  new  program  of  University  Research  Initiatives.  One  of  the 
major  topics  is  uncertainty  in  geospatial  information  representation,  analysis  and 
decision  support.  The  following  are  some  of  the  main  aspects: 


i.  Elements  of  Uncertainty:  Geospatial  information  is  extremely  complex  and 
includes  several  aspects  that  may  have  associated  uncertainties.  These  include 
location,  relationships  and  typologies.  They  requested  proposals  to  identify  and 
describe  all  aspects  of  uncertainty  associated  with  geospatial  information. 

ii.  Models  for  Uncertainty:  The  goal  here  was  to  develop  extensions  to  existins 
geospatial  data  models  that  accommodate  the  elements  of  uncertainty. 

iii.  Propagation  of  Uncertainty:  For  this  aspect  of  the  proposed  efforts,  they 
requested  development  of  algorithms  for  determining  how  uncertainty  is 
propagated  through  the  fusion  and  analysis  of  geospatial  information. 

Secondly,  the  University  Consortium  for  Geographic  Information  Science 
(UCGIS)  has  published  a  major  position  paper  [14]  of  research  priorities  for 
geographic  information  science.  In  this,  they  state  that  the  uncertainty  information 
associated  with  a  geographic  data  set  can  be  conceived  as  a  map  depicting  vaiyins 
degrees  of  uncertainty  associated  with  each  of  the  features  or  phenomena 
represented  in  the  data  set,  and  potentially  separable  into  three  components: 
uncertainty  in  the  typological  attributes  (describing  the  type  of  a  geographic 
feature),  uncertainty  in  the  locational  attributes,  and  uncertainty  in  spatial 
dependence  (the  spatial  relationship  with  other  features). 

Uncertainty  is  seen  as  appearing  in  every  part  of  the  geographic  data  life  cycle: 
data  collection,  data  representation,  data  analyses,  and  results.  The  data  that  passes 
through  the  stages  of  observation  to  eventual  archiving  may  be  handled  by  a 
variety  of  individuals/organizations,  each  of  whom  may  provide  their  own  distinct 
interpretations  to  the  data.  So,  the  uncertainty  is  mostly  a '  function  of  the 
relationship  between  the  data  and  the  user,  i.e.,  a  measure  of  the  difference 
between  the  data  and  the  meaning  attached  to  the  data  by  its  current  user.  The 
UCGIS  emphasized  that  research  was  needed  in  studying  in  detail  the  sources  of 
uncertainty  in  geographic  data  and  the  specific  propagation  processes  of  this 
uncertainty  through  GIS-based  data  analyses,  in  developing  techniques  for 
reducing,  quantifying,  and  visualizing  uncertainty  in  geographic  data,  and  for 
analyzing  and  predicting  the  propagation  of  this  uncertainty  through  GIS-based 
data  analyses. 


1.2  Sources  of  Imprecision  in  Spatial  Data 

There  is  a  variety  of  sources  of  imprecision  in  geographic  information  systems 
that  are  manifested  as  several  types  of  uncertainty:  (I)  Uncertainty  due  to 
variability  or  error;  (2)  Imprecision  due  to  vagueness;  and  (3)  Incompleteness  due 
to  inadequate  sampling  frequency  for  missing  variables  [15].  Both  uncertainty  of 
interpretation  and  inherent  ambiguity  are  illustrated  by  the  labeling  of  data  such  as 
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that  obtained  from  ^Lanrisat  images.  Tne  images  are  initially  processed  bv 
unsupervised  ciassincaiions  to  obtain  image  classes  and  then  the  results  ar"“ 
subjectively  assigned  land  cover  or  resource"  class  labels  bv  a  human  interpreter' 
This  is  an  inherently  subjective  task  in  which  the  interpreter  attempts  to  match 
objectively  derived  image  classes  with  linguistic  concepts  that  are  represented  in 
the  mind  of  the  interpreter.  It  is  not  surprising  that  there  is  variation  in  the 
interpretation  of  the  same  data  among  interpreters.  This  is  particularly 
troublesome  when  the  result  is  stored  in  a  database,  because  at  this  point  an 
inherently  imprecise  concept  requires  a  specific  representation.  As  a  result, 
uncertainty  modeling  in  GIS  has  been  used  for  the  classification  of  land  according 
to  soil  type,  vegetation  cover,  and  land  use  (e.g..  [16]).  Uncertainty  reoresentations 
in  both  probabilistic  [17]  and  fuzzy  [IS]  frameworks  [19]  have  been’utiiized.  and 
Fisher  [20],  [21]  has  striven  to  elucidate  the  fundamental  differences  between 
them  in  this  context.  In  applications  involving  remotely  sensed  information  and 
typical  multiple  sources  of  information  used  to  formulate  geographical  data,  the 
problems  of  imprecision  and  uncertainty  are  of  even  more  concern  [22! 

Many  operations  are  applied  to  spatial  data  under  the  assumption  that  features, 
attributes  and  their  relationships  have  beer,  specined  a  priori  in  a  precise  and  exact 
manner.  However,  this  assumption  is  generally  not  justifiable,  since  inexactness 
is  almost  invariably  present  in  spatial  data.  Inexactness  exists  in  the  positions  of 
features  and  the  assignment  of  attribute  values,  and  may  be  introduced  at  various 
stages  of  data  compilation  and  database  development.  Moreover,  inexactness  may 
be  propagated  through  GIS  operations  to  appear  in  modified  form  on  tabular  and 
graphic  output  products.  Inexactness  is  often  inadvertent,  as  in  the  case  of 
measurement  error  or  imprecision  in  taxonomic  definitions,  but  mav  also  be 
intentional  since  generalization  methods  are  frequently  arniied  to  enhance 
cartographic  fidelity. 

Models  o;  uncertainty  have  been  proposed  for  GIS  information  that  incorporate 
ideas  from  natural  language  processing,  the  value  of  information  concept!  non¬ 
monotonic  iogic  anc  fuzzy  set.  evidential  and  probability  theory.  For  example  in 
[2j>]  there  are  reviews  oi  four  models  of  uncertainty  based  on  probability  theory. 
Shafer^  theory  of  evidence,  fuzzy  set  theory  and  non-monotonic  locic.  Each 
model  is  shown  as  appropriate  for  a  different  type  of  inexactness  in  spatial  data. 
Inexactness  is  classified  as  arising  primarily  from  three  sources.  "Randomness" 
may  occur  when  an  observation  can  assume  a  range  of  values.  '"Vagueness"  may 
result  from  imprecision  in  taxonomic  dennitions.  "Incompleteness  of  evidence" 
may  occur  when  sampling  has  been  applied,  there  are  missing  values,  or  surrogate 
variables  have  been  emDloved. 


It  is  now  clear  that  uncertainty  concepts  must  be  part  of  the  apparatus  of  GIS 
[24],  Limited  positional  accuracy  is  found  in  maps  because  of  the  process  of  map 
production  and  the  inherent  limitations  of  Earth  measurement  systems.  Positional 
accuracies  better  than  about  1  part  in  10"  or  105  are  uncommon  in  paper  maps 
[II],  but  much  greater  precision  is  available  in  computational  systems.  Many 
designers  represent  positions  using  double  precision  (1  pan  in  I0W),  and  at  this 
accuracy,  a  map  of  the  entire  Earth  would  be  capable  of  recording  accurately  the 
positions  of  large  protein  molecules. 
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Scale  effects  have  also  been  considered  for  uncertainty  modeling  approaches. 
Maps  are  always  generalized,  to  a  degree  that  depends  largely  on  their 
representative  fraction  or  scale,  or  the  ratio  of  distance  on  the  map  to  distance  on 
the  ground.  A  map  at  scale  1:24,000  is  less  generalized  than  one  at  1:100,000. 
Often,  data  are  not  available  at  the  scale  required  by  an  application,  but  only  at  a 
coarser  scale.  As  an  example,  to  obtain  a  sufficiently  accurate  assessment  of  a 
given  area's  suitability  for  development  may  require  data  at  the  level  of 
generalization  corresponding  to  1:24,000,  but  the  only  available  digital  data  may 
be  at  a  scale  of  1:100.000.  In  this  situation,  uncertainty  introduced  into  the  results 
of  analysis  when  too-coarse  data  are  used  must  be  considered.  Ehlschlaeger  et  al. 
[25]  and  others  have  developed  methods  for  simulating  the  information  that  is  not 
available  due  to  excessive  generalization,  based  on  geostatistical  models 
calibrated  in  areas  where  both  data  are  available  at  both  scales. 

2  Application  of  Fuzzy  Models  in  GIS 
2.1  Spatial  Data  and  Attribute  Modeling 

Robinson  [26,  27,  28]  performed  some  of  the  earliest  research  on  fuzzy  data 
models  for  geographic  information.  He  has  considered  several  models  as 
appropriate  for  this  situation — the  two  early  fuzzy  database  approaches  using 
simple  membership  values  in  relations  by  Giardina  [29]  and  Baldwin  [30],  and  a 
similarity-based  approach  [2].  In  modeling  a  situation  in  which  both  the  data  and 
relationships  are  imprecise,  he  assesses  that  this  situation  emails  imprecision 
intrinsic  to  natural  language  which  is  possibilistic  in  nature.  A  possibiiistic 
relational  fPRLT)  model  was  chosen  as  providing  a  means  of  facilitating 
approximate  machine  inference  [31].  In  the  PRUF  model,  queries  and  propositions 
are  processed  by  identifying  constraints  induced  by  the  query'  or  proposition, 
performing  tests  on  each  constraint  and  then  aggregating  the  individual  test  resuits 
to  yield  an  overall  test  score.  Consider  a  proposition  stating  that  a  specified 
location  is  on  gentle  slopes  and  is  near  a  certain  city.  The  constraints  induced  by 
the  proposition,  "gentle"  and  "near."  are  tested  using  a  possibility  distribution 
yielding  test  results  indicating  the  degree  to  which  the  specified  location  satisfies 
each  constraint.  The  two  test  results  are  then  aggregated  to  produce  an  overall  test 
score  indicating  the  degree  to  which  the  proposition  is  satisfied. 

A  number  of  more  recent  papers  have  also  appeared  that  use  fuzziness  in 
modeling  imprecision  in  spatial  data.  Cross  [32]  presents  various  issues  in  the 
area  of  fuzzy  object-oriented  databases  and  design  and  how  these  relate  to  topics 
involving  GIS.  Userv  [33]  developed  an  object-based  model  in  which  objects  are 
a  direct  representation  of  the  geographic  entities  rather  than  geometric  elements 
such  as  point,  line,  and  area.  This  design  is  referred  to  as  a  feature-based 
geographic  information  system  (FBGIS)  since  the  term  feature  encompasses  both 
the  geographical  entity  and  its  object  representation.  The  approach  taken  is  to 
model  the  spatial  dimension  of  geographic  features  using  fuzzy  sets  to  define  the 
spatial  extent  of  the  thematic  dimension. 
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The  focus  is  on  features  with  undetermined  boundaries  which  are  thems 
fuzzy  and.  regardless  of  the  accuracy  of  measurement,  remain  fuzzy  based  c 
concept  of  the  feature.  Examples  include  hills,  valleys,  wetlands,  and  many 
geographic  features  that  cannot  be  rigorously  bounded  by  a  mathematical 
The  model  of  a  fuzzy  feature  follows  that  of  Leung  [34]  using  the  concepts  of 
and  boundary. 

For  the  hill  case,  the  core  defines  the  prototype  elevation  values,  i.e.,  all  vi 
that  are  unquestionably  a  pan  of  the  hill.  The  periphery  is  defined  by 
boundary'  and  is  less  representative  of  the  hill.  As  the  complexity  of  the  fe: 
increases,  the  complexity  of  the  fuzzy  set  membership  function  also  incre; 
Once  fuzzy  set  membership  is  defined,  one  can  define  fuzzy  operations  in  a 
environment.  Using  established  fuzzy  set  operations  such  as  intersection,  un 
and  complement,  a  fuzzy  overlay  operator  can  be  developed.  Fuzzy  buffer 
fuzzv  overlay,  including  intersection,  union,  and  complement,  and  i  tizzy  bourn 
operators  are  used  as  have  been  developed  by  Katinsky  [35  j. 

Another  paper  in  the  same  volume  by  Sarjakoski  [36  j  emphasized  issues  rek 
to  identity  of  geographical  objects.  Tnis  issue  arises  in  the  practical  contex- 
identifying  objects  in  a  complex  and  dynamic  environment:  for  example, 
question  of  enumerating  how  many  lakes,  islands  and  rivers  are  in  Finland. 

Existence  is  a  necessary'  property  of  an  object.  An  object  also  has  an  idem 
This  is  the  key  issue  when  distinguishing  between  fie!d-i’in  the  mathemati 
sense)  and  object-oriented  approaches  in  describing  geographic  reality  [37.  3 
There  is  no  identity  related  to  fields  as  such.  Objects  can  be.  and  very  often  : 
associated  with  regions  in  a  field:  thus,  fields  are  indirectly  connected  with 
concept  of  identity. 

Many  of  the  above  concepts  are  closely  interwoven.  Take  the  triple  existen 
extent-identity,  for  example.  It  is  difficult,  or  even  impossible,  to  speak  of  c 
without  another.  According  to  our  contemporary  understanding  of  the  unive: 
geographic  objects  can  only  exist  confined  in  four-dimensional  space-time.  Wr 
there  appears  to  be  fuzziness,  it  occurs  in  many  of  the  properties  of  an  object.  : 
in  one  alone.  Some  of  the  extreme  cases  are: 

Fuzziness  only  in  3D  space:  Fuzziness  may  be  limited  to  the  spa: 
extent  of  a  geographical  object,  i.e.,  the  boundary  of  the  object  is  ill-defined.  e\ 
though  there  is  no  doubt  about  its  identity  and  temporal  extent. 

Fuzziness  only  in  time :  The  temporal  extent  of  a  geographic  object  rr 
be  ill-defined,  even  though  the  spatial  extent  is  well-defined,  i.e..  the  spai 
boundary  is  sharp. 

Fuzziness  in  identity :  This  is  closely  related  to  fuzziness  in  spatial  a 
temporal  extent.  Tne  issue  here  is  whether  an  object  has  enough  identity  to  be 
individual.  With  dynamic  objects  (objects  that  change  in  time),  the  concept 
linked  to  the  problem  of  the  extent  to  which  an  object  can  change  and  still  be  i 
same  object. 


Fuzziness  in  class  definition :  Class  definitions  are  sometimes  fuzzv  z 
overlap,  making  it  impossible  to  assign  an  object  to  only  one  class. 

Fuzziness  in  class  membership :  Even  when  there  are  well-defir 
(axiom-  or  definition-based)  classes,  the  characteristics  of  an  object  may  still 
such  that  we  cannot  be  quite  sure  to  which  class  it  belongs. 

In  an  actual  survey,  heuristic  and  rather  informal  classification  rules  were  us 
to  define  lakes,  rivers,  and  islands.  Much  criteria  cited  involved  linauis 
descriptions  and  modifiers. 

In  principle  and  by  definition,  a  geographic  object  will  always  have  an  idem, 
and  therefore  a  crisp  existence.  This  implies  that  a  geographic  object  can  alwa 
have  an  individual  name.  Sarjakoski’s  case  study  demonstrates  that,  in  realitw  it 
often  very  difficult  to  state  whether  or  not  something  can  be  modeled  as 
individual  object.  The  objects  on  this  conceptual  borderline  can  be  said  to 
fuzzy  in  existence.  In  other  words,  it  is  not  clear  whether  the  object  exists  at  a 
fuzzy  existence  is  always  strongly  related  to  fuzziness  in  extent,  whether  spatial 
temporal.  Tne  task  of  counting  the  number  of  geographic  objects  deals  only  wi 
their  existence;  it  is  in  principle  unnecessary  to  know  their  extent.  Neverthele; 
because  the  objects  are  located  in  the  same  geographic  field,  their  spatial  exte 
must  be  determined  in  a  consistent  way. 

A  recent  system  developed  by  Robinson  is  the  Knowledge  Based  Lar 
Information  Manager  and  Simulator  (KBLIMS)  [39]  which  can  provide 
perspective  on  fuzzy  sets  as  a  basis  for  managing  certain  kinds  of  uncertainty  in 
geographic  information  system. 

Several  domains  were  considered  for  the  use  of  fuzzy  sets  to  model  uncertain: 
in  a  comprehensive,  intelligent  geographical  information  system  application  sue 
as  KBLIMS.  On  the  terrain  analysis  level,  uncertainty  representation  is  require 
for  the  hydro  ecological  simulation  system  as  well  as  the  intelligent  query  ponic 
of  the  system.  The  uncertainty  model  of  Davis  and  Keller  [40]  shows  how  t 
incorporate  fuzzy  metadata  concepts  with  Monte  Carlo  simulation  in  a  system  th: 
can  use  the  uncertainty  information  to  provide  an  assessment  of  the  validity  of  th 
model  output.  Tne  most  direct  approach  for  the  integration  of  fuzzy  sets  at  th 
terrain  level  in  KBLIMS  has  the  uncertainty  within  the  process  of  remain  analvs: 
so  that  the  outpui  is  in  a  crisp  form.  Otherwise,  the  objects  formed  from  the  terrai 
analysis  would  need  a  capability  to  represent  and  interpret  uncertainty  at  th 
object  model  level. 

The  object  model  level  of  KBLIMS  is  where  management  of  uncertainty  usir. 
fuzzy  sets  can  be  made  most  explicit  and  thoroughly  integrated  into  the  system 
Incorporation  of  fuzzy  instances,  objects,  and  classes  depends  greatly  upon  th 
nature  of  the  data  output  from  the  terrain  analysis  and  the  needs/capabilities  of  th 
query  model.  In  this  way,  the  traditionally  separate  fields  of  representation  an 
spatial  analysis  [41]  become  integrated.  A  working  object  model  in  KBLIM: 
requires  implementation  of  the  behavior  of  the  fuzzy  objects,  relations,  or  classe.- 
Often,  this  would  be  the  same  as  many  spatial  analysis  functions,  except  that  som 
spatial  analysis  functions  would  be  incorporated  into  portions  of  the  query  model. 
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KBLIMS  and  other  sophisticated  GIS  applications  have  not  been  initially 
designed  with  the  use  of  fuzzy  sets  in  mind  for  uncertainty  management.  There 
are  portions  of  the  system  which  may  or  may  not  use  fuzzy  sets  to  address  very 
specinc  problems  such  as  model  self-evaluation,  allowing  the  system  to  enhance 
its  ability  to  manage  uncertainty  generated  by  its  own  activities.  However,  as  the 
application  aomain  evolves  and  experience  with  formalizing  that  domain 
increases,  some  significant  kinds  of  uncertainty  mav  be  specified'’  and  ultimateiv 
managed  with  fuzzy  sets.  It  is  also  evident  that  in  GIS  aoDiications  such  a 
KBLIMS.  that  simply  arriving  at  membership  values  to  describe  variations  in  soils 
and  their  properties  over  space  is  only  the  beginning  of  the  development  of  a 
process  for  formalizing  and  managing  uncertainty  that  will  SDread  through  the 
system,  requiring  changes  at  the  object  model  ieve!. 


Spatial  Relationship  Modeling 


The  manner  in  which  spatial  data  is  modeled  affects  important  aspects  of  its  use, 
including  querying  capabilities  and  relationship  inferences.  Especially  important 
for  spatial  data  is  the  ability  to  model  relationships  between  spatial  features.  Such 
relationships  can  provide  a  significant  resource  for  processing  spatial  queries. 

A  spatial  data  model  that  provides  a  representation  for  storing  information 
concerning  binary  relationships  between  two-dimensional  objects  is  described  in 
previous  work  [42.  43.  44 J.  The  binary  relationships  incorporated  include  both 
qualitative  topological  and  directional  relationships.  In  this  approach.  "qualitative 
topological  relationships"  include  both  those  relationships  based  on  topological 
invariants,  e.g..  ngennorer  [45 j.  and  a  more  intuitive,  less  formal  set  of 
relationships  based  on  various  other  properties.  Tne  model  represents  a 
compromise  between  storage  and  performance  issues  by  explicit! v  storin': 
information  that  can  easily  be  used  for  inference  purpose's  bv'a  fuz’zv  ouerv 
framework,  such  as  that  given  in  [44. 46],  '  ’  ' 

Minimum  Bounding  Rectangles  (MBRs)  are  used  as  the  basis  for  object 
representation.  This  approach,  as  well  as  that  of  Nabii  (47J  Snarma  [4SJ  and 
Clementini  [49].  relies  upon  the  use  of  MBRs  as  approximations  of  the  geometry 
of  spatial  objects.  The  use  of  MBRs  in  geographic  databases  is  widely  practiced  as 
an  efncient  way  of  locating  and  accessing  objects  in  space  [49],  In  addition, 
numerous  spatial  data  structures  and  indexing  techniques  have  been  developed 
that  exploit  the  computationally  efficient  representation  of  spatial  objects  through 
the  use  of  MBRs  [50.  51],  * 

This  model  a; so  supports  fuzzy  relationship  representation  and  querying  of 
speciric  aspects  of  direction  and  qualitative  topology.  For  direction,  it  is  used  to 
answer  queries  related  to  the  degree  to  which  one  object  lies  in  a  particular 
direction  with  respect  to  a  second  object.  The  determination  of  the  degree  is  made 
through  calculations  that  utilize  area  and  axis  ratios  of  the  objects  involved.  The 
resulting  quantitative  value  is  then  mapped  to  a  range  that  corresponds  to  a  term 
known  as  a  directional  qualifier.  Directional  qualifiers  can  be  used  in  queries  to 
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determine,  for  example,  that  object  A  is  mostly  west  of  object  B,  or  that  object  A  is 
somewhat  southeast  of  object  B . 

The  need  for  such  directional  qualifiers  is  evident  when  considering  the 
difficulty  of  determining  a  precise  directional  relationship  for  2-D  objects.  The 
areal  extent  of  2-D  objects  often  results  in  the  possible  application  of  more  than 
one  directional  relationship.  Often,  centroids  are  used  to  provide  a  single  point  of 
reference  for  determining  orientation  of  2-D  objects,  (e.g.,  Chang  [52]).  However, 
some  information  loss  is  inherent  in  the  centroid  method.  This  model  preserves  all 
of  the  directional  relationships  that  exist  between  two  2-D  objects,  along  with  a 
representation  indicating  the  degree  to  which  the  objects  are  analyzed  to 
participate  in  each  of  the  relationships.  This  method  of  qualifying  directional 
relationships  more  closely  models  the  wray  in  which  humans  naturally  process  and 
communicate  such  information. 

Qualitative  topological  relationships  such  as  overlaps ,  surrounded -by,  disjoint , 
etc.,  are  similarly  modeled  so  that  the  degree  to  which  each  of  the  objects 
participates  in  the  relationship  is  stored  for  subsequent  fuzzy  querying. 
Determination  of  this  information  utilizes  various  combinations  of  area  ratios, 
ranges  of  which  are  mapped  to  a  set  of  linguistic  qualifiers  such  as  some,  most,  all, 
etc.,  in  a  manner  analogous  to  that  used  for  directional  relationships.  Tnese 
qualifiers  allow  users  to  distinguish  the  various  cases  of  the  same  relationship.  For 
example,  given  that  A  overlaps  B,  does  all  of  A  overlap  some  of  B.  or  does  little 
of  A  overlap  most  of  B?  Tnis  type  of  distinction  is  frequently  made  during 
human-to-human  interaction,  but  little  support  for  it  has  been  incorporated  into 
computer  data  models. 

The  definition  of  the  relationships  is  based  on  an  extension  of  Allen's  temporal 
relations  [53]  to  the  spatial  domain.  This  is  done  by  allowing  each  of  Allen's 
thirteen  relations  to  represent  the  interaction  of  two-dimensional  objects  in  terms 
of  an  x  and  v  relationship  component.  The  resulting  set  of  relationships  is  then 
used  for  defining  topological  and  directional  relationship  terminology.  A  data 
structure  called  an  abstract  spatial  graph  (ASG)  is  defined  for  the  binary 
relationships.  The  ASG  maintains  all  necessary'  information  regarding  topology 
and  direction,  and  provides  the  basis  for  processing  of  fuzzy  topological  and 
directional  queries. 

The  ASG  model  was  specifically  designed  to  store  explicit  information 
regarding  fuzzy  topological  and  directional  relationship  information  for  pairs  of 
objects.  This  correlates  well  with  the  observation  in  [45]  that  topological 
information  should  be  considered  first-class  information  within  the  realm  of 
spatial  databases. 

Briefly,  an  ASG  is  a  directional  polar  graph  with  three  types  of  nodes:  (1) 
nodes  representing  portions,  or  object  sub-groups  of  the  first  object  participating 
in  the  relationship,  (2)  nodes  representing  object  sub-groups  of  the  second  object, 
and  (3)  nodes  representing  overlapping  areas  and/or  reference  areas.  For  the 
graphical  representation  of  an  ASG,  the  various  types  of  nodes  are  differentiated 
through  color  and  symbol  style.  As  stated  earlier,  one  of  the  basic  assumptions  for 
the  data  model  and  subsequent  query  framework  is  that  objects,  or  features,  are 
enclosed  within  MBRs.  Reference  areas  for  relationship  calculations  are  defined 


as  being  either  one  of  the  object  sub-groups,  a  specially  defined  rectangular  area, 
or  a  line  segment-depending  upon  the  general  categorization  of  the  relationship 
involved. 

Each  node  of  an  ASG  has  a  pair  of  associated  weights:  an  area  weight  and  a 
total  node  weight,  that  provides  information  to  support  fuzzy  quervins. 
Specifically,  these  weights  are  used  to  define  fuzzy  qualifiers  that  answer  queries 
such  as,  "To  what  degree  is  area  A  east  of  area  B?"  or,  "How  much  of  area  A 
surrounds  area  B?.”  Total  node  weights  support  queries  of  the  first  type,  while 
area  weights  support  queries  of  the  second  type.  Ranges  of  weights  are  used  to 
define  qualitative  terms  useful  for  describing  these  concepts,  such  as  "most.” 
some,  siighth ,  or  mostly.  The  manner  in  which  the  w'eisnts  are  computed  is 
described  in  [43  j. 

The  ASG  data  mode!  supplies  the  infrastructure  upon  which  a  fuzzv  query 
system  is  built.  Tnis  system,  topological  query  framework  (TQF),  includes  a 
grammar  in  Backus-Naur  Form  (BNF),  data  structures  and  query  processing 
strategies.  Fuzzy  relationship  querying  follows  naturally  from  the' ASG  mode"! 
due  to  its  support  for  direct  binary'  relationship  representation  and  weiantine 
system.  Previous  work  [46]  has  shown  how  TQF  can  be  intearated  with  several 
existing  spatial  query  languages  to  provide  fuzzy  querying  extensions. 


3  Conflation 

Connation  is  typically  regarded  as  the  combination  of  information  from  two 
digital  maps  to  produce  a  third  map  that  is  better  than  either  of  its  component 
sources.  The  history  of  map  conflation  goes  back  to  the  early  to  mid-19S0's.  The 
tirsi  dear  development  and  application  of  an  automated  conflation  process 
occurred  during  a  join:  United  States  Geological  Survey  fUSGSrBureau  of  the 
Census  project  designed  to  consolidate  the  agencies'  respective  digital  map  fries  of 
U.S.  metropolitan  areas  [54],  The  implementation  of  a  computerized  svstem  for 
this  task  provided  an  essential  foundation  for  much  of  the  theory  and  manv  of  the 
techniques  used  today  .  Since  that  time,  others,  including  commercial  GIS  vendors, 
have  implemented  conflation  tools  within  their  applications.  For  an  example  of 
commercial  work  on  conflation,  see  [55], 

Conflation  of  maps  typically  is  needed  because:  (1)  users  wish  to  update  their 
mapping  information  without  losing  legacy  data  which  may  not  be  included  in  the 
new  information:  one  map  source  may  be  more  accurate  with  respect  to.  e.tr.. 

positional  or  attribute  information:  or  (3)  one  map  source  contains  information 
missing  in  another,  such  as  additional  features,  feature  attributes  or  even  entire 
coverages.  Motivation  for  this  work  includes  all  three  of  these  reasons  and 
represents  an  effort  to  more  fully  utilize  associated  non-spatial  data  as  well  as 
spatial  properties  in  an  "intelligent”  type  of  system  that  more  closely  mirrors  the 
way  that  human  experts  would  manually  conflate  mapping  data. 

Conflation  can,  in  brief,  be  viewed  as  a  multi-step,  iterative  process  that 
involves  feature  matching,  positional  re-alignment  of  component  maps  and 
attribute  deconfliciion  of  positively  identified  feature  matches.  Feature  matcnina, 
simply  and  perhaps  somewhat  obviously  stated,  involves  the  identification  of 


features  from  different  maps  as  being  representations  of  the  same  geographic 
entity.  Positional  alignment  is  a  mathematical  procedure  in  which  previously 
identified  matching  features  are  brought  into  spatial  agreement,  while 
deconfliction  is  a  process  in  which  contradictions  in  a  matching  pair’s  attributes 
and/or  values  are  resolved.  Positional  alignment  and  deconfliction  are  both  steps 
that  are  performed  after  a  positive  match  has  been  determined.  As  such,  it  is  easy 
to  see  that  accurate  feature  matching  results  are  essential  to  the  overall  quality  of 
the  resulting  conflated  map.  Because  of  this  dependency,  our  work  thus  far  has 
concentrated  solely  on  feature  matching  aspects  of  conflation.  Our  motivation  for 
this  work  includes  an  effort  to  make  individual  geographic  features  ’Intelligent" 
enough  to  know  when  and  how  to  conflate  themselves  in  a  distributed 
environment. 


3.1  Conflation  and  Uncertainty 

The  assessment  of  feature  match  criteria  is  a  process  in  which  evidence  must  be 
evaluated  and  weighed  and  a  conclusion  drawn — not  one  in  which  equivalence 
can  be  unambiguously  determined.  For  example,  fuzzy  concepts  such  as 
"closeness"  of  two  features  and  "similarity"  of  attributes  and  feature  groupings  are 
essential  for  determining  equivalence. 

In  fact,  feature  matching  can  be  considered  as  a  type  of  classification  problem. 
That  is,  we  are  trying  to  determine  whether  one  feature  belongs  to  the  same 
"class"  as  another:  in  this  case  the  class  consists  of  members  believed  to  be 
equivalent.  This  rype  of  problem  can  be  handled  through  theories  of  evidential 
reasoning  or  uncertainty,  such  as  fuzzy  logic  [57J  or  Dempster-Shafer  theory-  [5SJ. 
These  theories  attempt  to  provide  likelihood  measures  for  questions  based  or. 
available,  though  not  necessarily  conclusive,  evidence.  For  example,  in  feature 
matching,  the  question  we  must  consider  is,  "Based  on  the  available  evidence, 
what  is  the  likelihood  (or  probability)  that  feature  A  from  map  coverage  1 
represents  the  same  entity  as  feature  B  from  map  coverage  2?" 

Dempster-Shafer  theory'  is  a  method  of  inexact  reasoning  based  on  the  modeling 
of  uncertainty  as  a  range  of  probabilities.  It  provides  representations  for  the  degree 
of  belief  in  evidence  (belief  that  supports  a  hypothesis),  as  well  as  the  degree  of 
disbelief  in  evidence  (belief  that  refutes  a  hypothesis)  and  the  nonbelief  (neither 
belief  nor  disbelief;.  Tnat  is,  it  not  only  supplies  results  regarding  the  belief  that 
two  features  are  the  same,  but  it  also  provides  a  representation  of  how  much  it  is 
disbelieved  that  they  are  the  same,  along  with  the  degree  to  which  there  is  no 
evidence  either  way. 

The  approach  described  here  for  feature  matching  draws  from  aspects  of  both 
fuzzy  set  logic  and  evidential  reasoning.  In  particular,  the  assignment  of  matching 
scores  for  linguistic  attributes  is  directly  motivated  by  work  in  modeling  linguistic 
variables  by  the  use  of  fuzzy  sets.  For  example,  we  need  to  be  able  to  determine 
the  semantic  similarity  of  an  attribute  such  as  road  surface  type  which  may  have 
a  value  of  ‘hard’  for  one  feature  and  ‘asphalt’  for  the  potential  matching  feature. 
Obviously,  the  semantics  of  the  two  are  not  strictly  equivalent,  but  neither  are  they 
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contradictory.  Likewise,  the  idea  of  combining  scores  from  the  different 
components  of  feature  matching  to  arrive  at  a  single  matching  score  is  very  similar 
to  techniques  for  the  combination  of  evidence  used  in  evidential  reasonin':. 

Previous  work  [59.  60,  61]  has  concentrated  on  developing  an  uncertainty 
model  for  feature  matching.  Human  cartographic  experts  are  skilled  in  making 
mapping  decisions  based  on  conflicting  information,  and  it  is  this  skill  at 
reasoning  with  uncertain  information  that  we  attempted  to  mimic  in  this  system. 
Geographic  features  represented  in  different  physical  databases  are  verv  unlikely 
to  have  completely  identical  information  due  to  differences  in  "collection 
instruments,  human  interpretation,  processing  techniques  and  varying 
representations  oi  the  data.  Therefore,  a  conflation  mode!  must  be  able  to 
determine  when  such  differences  are  truly  indicative  of  non-matching  features, 
and  when  such  dirferences  could  be  attributed  to  one  of  the  causes  identified 
above. 

The  technique  developed  in  [59]  is  able  to  accommodate  cases  where  values  for 
corresponding  attributes  differ,  as  well  as  cases  where  the  attribute  sets  themselves 
differ.  For  implementation,  each  feature  is  considered  as  a  set  of  attribute-value 
pairs: 


(iaM.  vn).  (a;;,  v;;) . fa,n.  vln)) 

v:i)> ,a::-  v::) . v:ra)) 

From  this  representation,  a  degree  of  matching  similarity  is  determined.  A 
different  approach  is  used  for  different  attribute  value  domains.  For  numeric 
domains,  a  membersnip  matching  function  is  used,  while  a  similarity  table  is  used 
foi  linguistic  domains.  Our  approach  to  linguistic  attribute  matching  is  to 
establish  a  similarity  vaiue  s  tin  the  range  [0.  1]7 for  each  pair  of  elements  of  the 
domain.  Tnis  vaiue  is  determined  from  the  semantics  of  the  domain  and  the 
linguistic  terms.  The  characteristics  of  the  similarity  function  s  are: 

Av  x.  y )  =  jA(y,  x)  symmetric 

for  all  values  x.  v  s  domain  of  attribute  A.  For  well-defined  values  of  the  domain 
(e.g.  not  “unknown"  or  "other") 


sA(\.  yj  =  1  reflexive 
where  x  is  a  well-defined  value. 

As  an  example,  we  consider  the  Railroad  attribute  RRA  (Railroad  Power 
Source),  which  is  restricted  to  the  values  (0— Unknown,  1— Electrified  track.  3— 
Overhead  electrified.  -  Non-electrified.  999 — Other).  The  similarity  table  for 
RRA  is  shown  in  table  1.  Since  linguistic  similarity  is  symmetric,  we  need  only 
show  the  lower  triangular  values  in  the  table.  Note  that  since  1,  3,  and  4  are  well- 
defined  linguistic  terms,  they  are  shown  with  the  reflexive  value  of  1  on  the 
diagonal,  e.g.  srRA(3.  3;  =  1.  However,  since  0  and  999  are  non-specific 
categorical  values  for  this  particular  domain,  their  diagonal  values'  were 
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determined  to  be  less  than  1,  reflecting  the  potential  lack  of  exact  matching  for 
such  linguistic  terms. 


Table  1.  Similarity  table  for  attribute  RRA . 
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Because  most  features  have  more  than  one  attribute,  we  must  also  consider 
semantic  interrelationships  among  the  attributes  in  determining  matching  features. 
These  are  represented  as  rules  in  the  expert  system  which  return  associated 
weights.  These  weights  are  used  either  to  add  credence  to  the  hypothesis  of 
matching  features,  or  to  weaken  it.  As  an  example,  consider  the  Railroad 
attributes  LTN  (Number  of  tracks)  and  RRC  (Railroad  category).  The  rule  for  the 
relationship  between  the  two  attributes  is  expressed  as: 

IF  ((RRi.ltn  =  3  and  RR2.1tn  =  2)  and 
(RRl.rrc  =  16  and  RR2.rrc  =  16)) 

THEN  wm  1 .0 
WUn  0.5, 

where  RR1  represents  the  first  Railroad  object  and  RR2  represents  the  second. 
This  rule  illustrates  a  conflict  in  the  values  of  the  length  attribute  of  the  two 
features.  We  see  from  the  example  that  the  resulting  weight  for  LTN  is  weakened, 
giving  it  less  influence  than  that  of  RRA  in  the  combined  matching  score. 

A  composite  matching  score  is  then  computed  from  the  combination  of  the 
expert  system  weights  and  the  similarity  table  values.  This  score  is  given  as: 

MS,  j  =  [szmA*(F,,  F,)x  ESKM])/  N 

where 

Ak  =  kth  attribute  in  both  F,  and  Fy,  where  0  <  k  <  N. 

N  =  number  of  attributes  that  describe  both  Ff  and  F;. 

ESWak  =  weight  associated  with  A*  computed  by  the  expert  system. 

3.2  Geometric  Analysis 

The  second  part  of  this  work  consists  of  a  technique  for  qualitatively  matching 
linear  features  based  on  shape  similarity.  This  broadens  the  basis  for  feature 
matching  by  adding  a  spatial  component  to  the  non-spatial  attribute-matching 
algorithm  presented  above. 
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Tj£  first  step  inyoives  a  pre-processing  of  the  features  to  brine  them  into  a 
standard  position  for  the  shape  similarity  analysis.  Translation,  "rotation,  and 
scaling  transrormations  are  applied  in  sequence  to  accomDiish  this  positioning 
Translation  is  performed  by  moving  the  features’  starting  nodes  to  the  oriein  of 
planar  axes.  Tnen.  the  end  nodes  are  rotated  to  the  x-axis.  Finally,  the  features 
are  scaled  so  that  the  starting  and  ending  nodes  are  equal  to  [0  0]  and  [1  01 
respective!} .  These  transformations  allow  us  to  more  reliably  compare  linear 
features  of  dirferent  sizes.  Figure  I  shows  the  results  of  standardizing  two  linear 
features  througn  this  process. 


fa) 
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Figure  1.  Linear  features  (ai  before  and  Cbi  after  standardization  process. 


Dial's  ’"T  r'*IUr"'5  h:i%e  Dv~"  Stanci:ir°‘Z'“'  a  Process  called  node  margins;  takes 
pu„  Tms  requires  all  nodes  from  one  feature  to  be  manned  onto  the  othe- 

eature.  and  vice  versa.  Tms  is  performed  on  the  basis  of  a  node’s  distance  from 
ne  starting  none  ot  the  feature.  Consider  two  features.  FI  and  F2.  comprised  of 
".  '  ‘‘‘  n.‘- .  niJJ  and  n::>  ■•••  n;J.  respectively,  and  where  tn® 

associated  ratios  are  ziven  as  ir-  r-  rnnd,r  r  .  ■  •  . 

. V  ana  ir;i.  r;; . r:i; i  suen  that  r; .  =  r~ 

,  "  "  r‘!:  -  .  uner£  F  is  the  ratio  of  the  distance  from  start  to  node  n  to 

the  overall  lengtn  ot  the  linear  feature.  Then,  new  features  FI  and  F"1  are  cr-at-d 
that  each  have  me  same  number  of  nodes  such  that  each  node  is  the  result  of 
merging  the  features'  nodes  based  upon  their  ratios. 

Given  that  the  features  are  in  standard  position  and  the  nodes  have  been 
merged,  we  begin  the  last  phase  in  assessing  shape  similarity.  This  involves 
determining  a  normalized  distance  measure  between  corresponding  nodes  of  the 
two  lectures,  siven  as  y  11 


SHsim  =  (X^"2  Dist(nu ,  n2i )/(/).  +  Z>:)) 
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where  j  +  k  -  2  is  the  number  of  distances  to  be  computed,  and  Dj  is  the  total 
length  of  F;.  This  implies  that  features  with  a  higher  shape  similarity  measure  are 
more  likely  candidates  for  matching  features.  It  is  based  on  Frechet’s  measure  of 
distance,  L:  -  Distance ,  between  polygonal  arcs  [55].  Suppose  that  we  have  two 
functions 

P:[0,1J->R2 
Q  :  [0,1]  — »  R2 


where  (p0,  pi,  pn)  and  (q0,  qt,  are  ordered  point  sets  for  P  and  Q. 

respectively.  Then  the  set  of  ratios  used  for  computing  the  distance  between  P  and 
Q  at  all  of  the  r/s  is  computed  as  described  for  node  merging  above.  Thus,  for 
Frechet’s  measure,  the  following  is  used. 


=fsro 


v/: 
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Now,  given  that  (xPi,  yPi)  and  (xq„  yQi)  are  the  coordinates  of  P(r,)  and  Q(rO, 
respectively,  and  Aj. /  =  Xp>  —  XQ.  and  Ay,  /  =  yp.  -  yQi are  the  measured 

separations  of  the  linear  features  at  the  bending  points  of  either  of  the  lines,  then 
the  closed  form  of  the  equation  for  L:  -  Distance  is  represented  by  * 


i  +A*.;-  A, 
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Of  course,  the  mathematical  exposition  presented  here  presents  the  objective 
portion  of  feature  matching.  The  subjective  portion  represented  by  the  expen 
system  is  much  more  difficult  to  capture.  In  response  to  this  problem.  Foley  et  al. 
[60,  61]  present  a  web-based  knowledge  acquisition  tool  for  capturing  expen 
knowledge  needed  for  conflation. 

4  Distributed  Spatial  Systems 

In  recent  years,  the  trend  for  most  types  of  information  systems  has  been  a  move 
to  a  more  loosely  coupled,  distributed  nature.  The  maturity  of  client-server 
architectures  and  software,  as  well  as  the  virtual  explosion  of  web-based  systems, 
has  demonstrated  the  enormous  advantages  of  distributed  systems.  Furthermore, 
the  advent  of  successful  middleware  technology  such  as  the  CORBA  2.0  standard 
and  corresponding  vendor  implementations  has  drastically  reduced  barriers  to 
communication  and  data  sharing  among  heterogeneous  software  and  hardware 
systems. 
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The  interest  in  distributed  GIS  is  no  less  than  that  of  general  information 
systems;  however,  the  uniqueness  of  the  nature  of  spatial  data  makes  the  issue  of 
true  interoperability  of  GIS  a  major  research  concern.  Evidence  of  this  is 
abundant  in  the  literature,  and  is  also  illustrated  by  initiatives  such  as  the  Open 
Geodata  Interoperability  Specification  (OGIS)  work  by  the  Open  GIS  Consortium. 
Inc.,  as  well  as  UCGIS  priority  research  panels  [14]  on  "Interoperability  of 
Geographic  Information"  and  "Spatial  Data  Acquisition  and  Integration." 


4.1  Uncertainty  and  Conflation  for  Distributed  Data 

Obviously,  issues  of  uncertainty  that  apply  to  conflation  within  a  single  system — 
such  as  those  for  feature  matching — are  also  applicable  to  conflation  in  a 
distributed  environment.  However,  we  believe  additional  factors  related  to  the 
general  topic  of  distributed  databases  increase  the  scope  of  uncertainty  that  must 
be  considered  in  this  context.  As  background,  we  can  draw  from  the  abundance  of 
past  and  ongoing  research  in  the  realm  of  schema  merging  for  conventional  (i.e., 
relational  )  distributed  heterogeneous  databases.  An  example  of  work  in  this  area 
includes  [62]. 

The  general  concept  of  schema  merging  involves  resolution  of  incompatibilities 
in  metadata.  These  incompatibilities  may  be  either  structural  or  semantic  in 
nature.  Structural  incompatibilities  involve  those,  for  example,  in  which  attributes 
for  representing  the  same  values  are  defined  differently.  These  may  include 
different  names  for  the  attributes,  or  different  domains  for  their  associated  values, 
e.g..  float  vs.  integer.  Semantic  incompatibilities,  on  the  other  hand,  represent 
those  cases  in  which  similarly  defined  attributes  have  different  meanings  or 
values.  For  example,  an  attribute  of  width  for  a  road  in  one  database  may  include 
the  width  of  the  road  pius  any  associated  right-of-ways,  while  the  same  attribute 
name  in  another  database  may  only  imply  the  width  of  the  paved/dri veabie  portion 
or  the  road.  Semantic  incompatibilities  are  much  more  difficult  to  handle 
automatically,  as  they  necessarily  imply  a  deeper  understanding  of  the  data. 

It  is  dear  that  these  issues  are  very  similar  to  ones  that  must  be  faced  in 
performing  conflation  in  distributed  spatial  databases.  In  particular,  semantic 
schema  integration  and  the  feature-matching  phase  of  conflation  require  similar 
levels  of  knowledge  regarding  the  meanings  that  various  data  are  intended  to 
convey.  Semantic  knowledge  is  inherently  uncertain,  as  interpretations  of  even 
the  most  seemingly  unambiguous  words  and  phrases  vary  among  individuals. 
Similarly,  structural  differences  in  spatial  data  representation  for  like  features  are 
to  be  expected  in  any  distributed  system  comprised  of  heterogeneous  data  sources. 
It  is  evident  from  this  discussion  that  conflation  in  a  distributed  environment  can 
be  viewed  as  a  specific  application  of  issues  related  to  uncertainty  in  schema 
merging. 

In  keeping  with  generally  accepted  principles  of  distributed  (as  well  as  non- 
distributed)  database  design',  the  model  presented  in  [63]  is  completely 
independent  of  physical  concerns  such  as  data  partitioning.  However,  the  general 
issue  of  considering  conflation  within  a  distributed  system  versus  a  standalone 
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system  does  impact  design  decisions,  even  at  an  abstract  logical  level.  The 
primary'  issue  considered  in  this  work  is  centralization  versus  distributed  control 
of  conflation  events. 

4.2  Object-Oriented  Modeling  Approach 

The  logical  design  for  this  approach  is  based  on  an  object-oriented  (00)  model. 
The  00  paradigm  is  well  accepted  as  the  prevailing  method  for  the  representation 
and  manipulation  of  complex  data  such  as  geographical  information.  Within  an 
00  framework,  one  is  able  to  define  models  of  real-world  data  in  ways  analogous 
to  those  in  which  we  intuitively  perceive  and  interpret  those  data.  As  a  general 
introduction  to  the  subject,  an  object  is  a  collection  of  data  (state)  and  methods 
(behavior)  that  represents  the  properties  and  processes  of  a  real-world  entity,  such 
as  the  Chesapeake  Bay  Bridge  or  the  Mississippi  River.  A  class  is  a  template  for 
creating  new  objects  that  share  common  properties.  For  example,  there  may  be  a 
bridge  class  that  captures  generic  information  that  every'  instance  of  that  class  fan 
instantiation)  should  contain.  This  would  most  likely  include  data  such  as  length, 
height,  maximum  weight  limit,  and  type  (drawbridge,  suspension,  etc.).  Examples 
of  procedures  for  a  bridge  class  could  include  opening  and  closing. 

The  packaging  of  an  object's  data  with  its  procedures  is  known  as 
encapsulation .  Encapsulation  allows  modifications/additions  to  the  system  with 
minimal  impact  on  other  system  components.  This  property  is  crucial  for  the 
successful  development  and  maintenance  of  complex  software  systems  such  as 
GIS.  Other  major  concepts  for  understanding  00  models  include  inheritance  and 
polymorphism .  Inheritance  is  the  automatic  inclusion  of  attributes  and  methods 
from  classes  defined  at  a  higher  level  in  a  class  hierarchy.  The  class  hierarchy  is 
designed  with  more  general  classes  being  placed  at  higher  levels  and  more 
specific  classes  being  piaced  at  lower  levels.  Inheritance  reduces  the  need  to 
duplicate  data  and  code,  while  the  structuring  of  a  class  hierarchy  provides  a 
logical  organization  of  object  "types.” 

Polymorphism  allows  methods  for  different  objects  to  have  the  same  name, 
while  providing  different  implementations.  For  example,  both  a  circle  and  a 
square  class  could  implement  a  method  that  calculates  area.  Both  methods  could 
be  named  area,  but  the  implementations  would  use  the  appropriate  formula  for 
either  a  circle  or  a  square.  Methods  that  invoked  the  area  method  for  an  object 
would  use  one  name— the  class  of  the  receiving  object  would  determine  which 
implementation  to  use.  Polymorphism  helps  to  simplify  system  design,  and 
reduces  coding  complexity  by  eliminating  error-prone  if-then-else  type-checking 
constructs. 

These  concepts  are  applied  to  our  work  in  the  following  ways.  Encapsulation 
allows  each  geographic  feature  to  maintain  its  own  state  of  knowledge  related  to 
information  and  procedures  necessary  for  conflation  with  another  feature. 
Changes  in  the  algorithms/implementations  of  a  particular  feature’s  conflation 
abilities  do  not  affect  other  features.  Inheritance  allows  us  to  describe  general 
types  of  conflation  knowledge  applicable  to  multiple  feature  classes  within  a 
single  class  at  a  high  level  in  the  hierarchy.  Lower-level  classes  that  are  more 
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specific  then  automatically  inherit  this  knowledge.  Finally,  polymorphism  allow': 
us  to  implement  general  conflation  procedures  that  are  valid  for  instances  of  any 
feature  class:  thus,  polymorphism  greatly  reduces  the  need  to  be  concerned  with 
the  issue  of  "type.”  The  result  is  that  we  are  able  to  treat,  for  example,  raiiroac 
features  and  building  features  in  the  same  manner  at  a  logical  level.  Figure  2 
shows  a  simplified  class  hierarchy  for  conflation. 


The  Naval  Research  Laboratory's  Geospatial  Information  Database  f'GIDB- 
prototype  [64].  within  which  proof-of-concept  implementation  of  this  mode:  L 
currently  being  performed,  is  centered  on  the  concept  of  spatial  range  queries,  also 
known  as  area-of-interss:  (AOI)  queries.  Given  an  AOI.  through  definition  of 
manual  or  graphical  bounding  box  coordinates  or  geographical  piace  name,  the 
GIDB  is  able  to  return  any  vector,  raster  or  multimedia  data  available  for  that  area. 
Advanced  aueries.  such  as  those  limiting  vector  data  attribute  values,  are  also 
available. 

Our  conceptual  model  of  the  conflation  process  is  based  on  this  system  mode! 
of  AOI  queries,  limited  to  the  consideration  of  vector  data  only.  A  diagram  of  the 
conflation  process  is  shown  in  figure  3.  The  primary  point  to  note  is  that  the 
process  takes  piace  at  the  individual  feature  level,  irrespective  of  that  feature's 
physical  residence,  or  logical  database,  library  or  coverage  inclusion.  In  the  first 
step,  the  user  selects  an  AOI.  The  query  manager  then  retrieves  all  feature  objects 
from  the  distributed  database  that  fall  within  the  AOI  (subject  to  any  other 
constraints  imposed  by  the  user).  From  this  collection  of  objects,  the  query 
manager  randomly  selects  one  object  to  send  a  “conflate”  message.  That  object 
follows  the  protocol  for  determining  which,  if  any,  of  the  remaining  objects  in  the 
query  collection  are  matching  representations  for  itself.  Any  object  determined  to 
be  a  match  is  placed  in  the  matching  feature  set  for  the  conflating  object,  ranked 
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according  to  similarity  scores.  Objects  that  have  been  placed  in  a  matching  feature 
set  are  removed  from  the  general  query  collection,  so  as  not  to  be  candidates  for 
subsequent  conflation  iterations.  This  philosophy  implies  that  only  those  objects 
that  have  scores  strongly  suggesting  matching  features  are  placed  in  the  matching 
feature  sets.  The  query'  manager  continues  by  sending  “conflate”  messages  to  the 
remaining  members  of  the  query  collection.  When  the  process  is  complete,  the 
query'  manager  returns  the  query  collection  for  display.  For  any  features  with  non¬ 
empty  matching  feature  sets,  the  member  of  the  set  with  the  highest  quality  score 
(NOT  the  same  as  the  similarity  scorel  is  returned  as  the  representative  for  that 
feature.  The  quality  score  is  an  indication  of  the  fitness  for  use  of  the  data,  and  is 
based  on  multiple  factors  derived  from  both  the  data  and  metadata. 


Several  points  on  the  conceptual  model  are  worth  noting  at  this  point.  First,  the 
responsibility  for  conflation  is  distributed.  Each  individual  feature  contains  the 
inherited  knowledge  needed  for  performing  feature  matching  for  that  particular 
class  of  geographic  objects.  This  is  preferable  to  a  centralized  conflation  system, 
where  a  single  conflation  object  “manages”  the  process.  Such  a  system  would  be 
more  difficult  both  to  implement  and  maintain  due  to  differences  in  the  ways  in 
which  objects  from  various  feature  classes, should  be  matched,  as  well  as  the 
ramifications  of  adding  new  features  anchor  system  nodes  for  a  distributed  system. 
Second,  the  process  is  automatic.  Because  the  query'  manager  collects  the  features 
satisfying  the  user  query'  and  invokes  the  conflation  method  for  each  object  before 
returning  the  final  set.  the  user  does  not  have  to  explicitly  request  conflation,  or 
even  need  to  know  that  conflation  is  an  issue.  Third,  when  the  results  are  returned 
to  the  user,  only  a  single  representation  of  each  object  is  presented.  For  now,  our 
treatment  of  deconfliction  is  simply  to  select  the  “best”  feature  from  a  matching 
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feature  set.  based  on  various  objective  and  subjective  criteria.  This  idea  of 
simplifying  the  user's  concern  and  involvement  with  conflation  is  critical  for  end- 
user  based  systems,  as  the  whole  need  for  such  an  automated  approach  is  derived 
from  the  concept  of  non-expert  users.  A  more  detailed  explanation  of  the  model 
can  be  found  in  [63]. 

Though  irrelevant  at  a  conceptual  level,  physical  allocation  of  data  in  a 
distributed  system  is  of  paramount  concern  for  performance  issues.  Here,  we 
briefly  consider  one  possible  allocation  scheme  and  the  potential  impact  as  related 
to  the  model. 

The  scheme  we  consider  is  a  partitioning  based  on  representational  classes. 
That  is.  all  iine  features  such  as  railroad  lines,  power  lines,  etc.,  reside  on  the  same 
physical  node:  all  point  features  are  likewise  allocated  to  the  same  node,  as  are  all 
area  features.  In  consideration  of  the  hierarchy  given  in  figure  2  within  this  setup, 
we  believe  the  optimal  partitioning  of  the  hierarchy  is  to  include  all  ievel  3 
(feature  class)  classes  and  their  instantiations  together  with  their  corresponding 
ievel  2  superclass.  For  example,  all  transportation  lines,  utility  lines,  etc.  would 
be  co-resident  with  the  line  class  object.  For  conflation,  this  setup  would 
minimize  network  traffic  needed  for  transfer  of  data  and  execution  of  methods,  as 
most  of  the  specific  conflation  knowledge  is  indented  from  the  SCO  and  values 
instantiated  at  these  lower  two  levels.  Of  course,  other  partitionings  may  also 
provide  acceptable  performance,  but  this  one  is  given  as  an  example  of 
considerations  between  data  allocation  methods  and  the  distributed  conflation 
model. 

5  Concluding  Remarks 

The  importance  of  uncertainty  modeling  for  spatial  data  and  G1S  weii 
recognized  in  the  geographic  information  science  research  community.  We  have 
surveyed  the  relevant  issues  and  some  specific  efforts  at  spatial  modeling  using 
fuzz)  set  approaches.  As  distributed  systems  and  web-based  interoperability  have 
come  to  the  forefront,  uncertainty  issues  are  then  reflected  in  such  an 
environment. 

Our  current  research  directions  are  to  more  full;-  develop  forma:  approaches  to 
distributed  spatial  data  semantics  and  to  produce  prototypes  based  on  these 
approaches. 
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The  widespread  use  of  computers  has  fostered  a  progression  of  mapping  from  the 
form  of  paper  charts,  maps,  and  satellite  photos  to  the  digital  format.  The  National 
Imagery  and  Mapping  Agency  (XIMA),  bearing  the  sole  responsibility  for  developing 
mapping  information  for  the  Department  of  Defense,  embarked  on  a  program  to  trans¬ 
form  their  traditional  paper  mapping  products  into  digital  format.  To  accomplish  this 
goal,  in  the  1980s,  NIMA  designed  the  Vector  Product  Format  (VPF)  as  its  database 
specification.  VPF  was  developed  as  a  relational  data  model  As  the  VPF  products  were 
reviewed  and  used,  however,  it  became  apparent  that  the  relational  format  was  not 
suited  to  the  complexity  of  spatial  mapping  data.  For  example,  the  use  of  many  tables 
to  represent  spatial  topology  of  one  feature  resulted  in  referential  integrity  problems 
during  an  update  process. 

The  Navi  Research  Laboratory's  (NRL)  Digital  Mapping,  Charting,  and  Geodesy 
Analysis  Program  (DMAP)  at  Stennis  Space  Center  in  Mississippi  proposed  a  possible 
solution  to  some  of  the  VPF  problems.  An  alternate  data  model  using  object-oriented 
technology  seemed  to  accommodate  the  complexity  of  spatial  data.  In  1994,  the  DMAP 
team  was  able  to  successfully  prototype  the  first  object-oriented  application  using  one  of 
NIMA's  VPF  products,  the  Digital  Nautical  Chart  (DNC).  The  prototype  showed  the 
reusability  and  rapid-prototyping  that  resulted  from  use  of  object-oriented  technology. 
Consequently,  DMAP  expanded  the  object-oriented  data  model  to  integrate  the  other 
VPF  products  with  the  DNC  into  an  object-oriented  VPF  (OVPF).  These  additional  VPF 
products  are  Digital  Topographic  Data  (DTOP),  Vector  Map  (VMAP),  Urban  Vector  Map 
(UVMAP),  World  Vector  Shoreline  (WVS),  and  Foundation  Feature  Data  (FFD).  DMAP 
has  continued  to  advance  OVPF  to  include  other  NIMA  data  types  such  as  Raster  Prod¬ 
uct  Format  (RPF),  which  is  the  basic  format  for  raster  products  such  as  satellite  imageries 
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and  scanned  charts;  and  Text  Product  Standard  (TPS),  which  uses  SGML-based  standard 

for  textual  information  such  as  sailing  directions. 

Having  demonstrated  that  object-oriented  technology  easily  accommodates  the  com¬ 
plexity  of  the  spatial  data,  the  next  milestone  for  NMA  and  IMAP  was  to  provi  «  * 
vehicle  for  distribution.  .As  the  Web  technology  began  to  advance  with  the  nse  of  the  Java 
programming  language  and  object-oriented  standard  committees  such  as  the  Object 
Management  Group  (OMG),  DMAP  was  able  to  prototype  spatial  information  distrib¬ 
ution  over  the  Web.  This  was  through  an  area-of-interest  (AOI)  application  called 
tosparia!  Information  Distribution  System  (GIDS).  The  GIDS  provides  aU  avaiUb  e 
information  over  the  specified  AOI  independent  of  the  data  type-  Some  of  the  m 
diverse  data  types  handled  by  the  GIDS  include  HSRI's  shape  hie  format,  video  clips 
audio  clips,  and  other  industry  standards  such  as  tiff,  gif,  and  jpeg.  Adding  these  to  the 
GIDS  was  relatively  simple  due  to  the  object-oriented  nature  of  the  data  model. 

Much  of  the  work  to  date  has  been  the  database  design  to  accommodate  the  multiple 
spatial  data  times,  the  development  of  a  Web  applet  for  map  display,  and  the  imple¬ 
mentation  of  query  mechanisms.  While  this  work  has  concentrated  on  two-dimensional 
mapping,  the  need  for  the  three-dimensional  representation  of  some  geographic  entities 
to  support  mission  preparation  and  rehearsal  especially  in  an  urban  setting  is  also 
present.  Thus  DMAP,  in  conjunction  with  the  University  of  New  Orleans,  procee  e 
research  and  design  a  3D  spatial  data  model,  called  VPF+. 

This  chapter  describes  the  overall  system  and  database  design.  As  an  example,  an 
actual  experimental  situation,  the  March  1999  Urban  Warrior  Experiment  (UWE),  is 
used  to  demonstrate  the  use  of  GIDS.  Furthermore,  the  VPF+  data  model,  as  well  as  its 
applicability  in  the  UWE,  is  presented. 


Use  of  Object-Oriented  Technology 

and  OPBMS _ _ _ 

Object-oriented  technology  was  chosen  to  handle  the  complexity  of  spatial  data^  Some 
spatial  data  such  as  VPF  produced  by  NTMA  already  is  designed  as  relational  data  A 
minimum  of  nine  tables  is  used  to  define  one  point  feature;  at  minimum,  sixteen  ta  es 
are  used  to  define  one  polygon  feature.  Rather  than  considering  a  feature  as  rows  of 
nine  to  sixteen  tables,  an  object-oriented  approach  allows  data  manipulation  and  man¬ 
agement  at  a  feature  level.  An  analogous  point  feature  object  would  have  *e  same  v  - 
ues  that  are  defined  from  the  rows  of  nine  tables,  as  an  example,  for  a  specific  fea  e, 
however,  instead  of  having  to  access  feature  information  from  different  rows  of  differ¬ 
ent  tables  for  a  given  feature,  a  single  access  to  a  feature  object  would  provide  all  the 

information  for  that  given  feature.  ,  j..a 

The  complexity  of  spatial  data  is  also  due  to  the  stored  topology  (i.e.,  immediate 
neighbor  information).  When  the  geometry  of  a  feature  changes  m  any  manner,  this 
requires  the  stored  topology  to  be  recomputed.  Recomputation  implies  a  change  to 
table  columns  for  a  certain  row.  Due  to  the  dependency  on  other  related  tables,  recom¬ 
putation  may  require  several  tables  to  be  changed  .Recomputation  may  also  affect 
adjacent  feature  information.  Tables  related  to  the  adjacent  feature  may  also  be  mo  1- 
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fied.  Such  a  rippling  effect  of  feature  updates  makes  the  manipulation  and  manage¬ 
ment  of  spatial  data  in  relational  format  error-prone.  Maintenance  of  referential 
integrity  during  these  updates  of  multiple  features  is  an  additional  concern.  On  the 
other  hand,  for  feature  objects,  a  change  is  localized  to  that  feature  object  only.  Thus,  a 
change  is  applied  by  accessing  the  feature  object.  Likewise,  topology  recomputation 
requires  those  adjacent  feature  objects  to  be  accessed  and  modified  accordingly. 

As  the  volume  of  data  increased  and  the  need  for  multi-user  access  developed,  the 
need  to  have  true  database  functionalities  such  as  persistent  storage,  controlled  access 
to  the  data,  and  backup  and  recovery  capabilities,  among  others,  became  important. 
Object-oriented  database  management  systems  (ODBMS)  provide  these  functionalities 
specifically  for  objects. 

The  distinction  between  persistent  and  transient  objects  is  somewhat  less  clear  for 
ODBMSs  than  RDBMSs,  due  to  the  tightly  coupled  nature  of  the  object-oriented  data¬ 
base  with  the  application.  Transient  objects  are  defined  as  those  objects  that  exist  in  the 
computer  s  memory  only  during  execution  of  the  application,  whereas  persistent 
objects  exist  even  when  there  is  no  application  program  running.  Transient  and  persis¬ 
tent  objects  can  coexist  within  a  running  process.  Persistent  data  accessed  from  the 
database  can  be  assigned  to  a  transient  object.  It  is  important  for  application  programs 
to  manage  both  transient  and  persistent  data  in  consistent  wavs  so  updates  to  persis¬ 
tent  data  are  made  when  needed,  and  data  that  should  remain  transient  are  not  inad¬ 
vertently  made  persistent. 

A  natural  extension  to  the  decision  of  using  object-oriented  technology  and  an 
ODBMS  was  the  utilization  of  the  Common  Object  Request  Broker  Architecture 
(CORBA).  CORBA  allows  interoperability  among  different  programming  languages  as 
well  as  different  hardware  platforms.  For  example,  this  project  demonstrated  interop¬ 
erability  between  Smalltalk  and  Java,  and  from  a  Solaris  platform  to  Windows  NT 
using  CORBA  infrastructure.  Since  the  ODBMS  selected  provided  an  Object  Request 
Broker  (ORB)  directly  interfaced  to  the  ODBMS,  the  data  access  was  achieved  at  the 
repository  level.  Thus,  multi-user  access  was  provided  through  different  clients  such 
as  applets,  which  enabled  direct  Web  browser  interaction  with  the  ODBMS. 


2D  GIDS  Design 


The  GIDS  has  a  client/server  architecture.  It  is  composed  of  server,  interface,  and  client 
modules  as  shown  in- Figure  17.1. 


Figure  17.1  GIDS  system  components. 
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Server 

A  commercial,  off-the-shelf  object-oriented  database  management  system,  GemStone, 
is  used  as  an  object  server  that  stores,  manipulates,  and  processes  objects  referenced  by 
each  client.  The  server  consists  of  two  functional  modules:  data  storage  and  data 
manipulation  or  processing.  Based  on  the  request  from  each  client,  the  GemStone 
server  searches  and  retrieves  only  those  objects  that  meet  the  requested  criteria.  Data 
search  for  retrieval  is  performed  mostly  on  the  server. 

^ server  maintains  a  class  hierarchy  as  shown  in  Figure  17.2.  The  MapObject  class  is 
the  super  class  or  parent  to  all  the  spatial  classes.  Each  database  has  its  subclasses.  For 
example.  Figure  17.3  shows  the  class  hierarchy  for  VPFDatabase  and  MMDatabase. 

Each  subclass  of  MapObject  has  a  global  variable.  Databases,  which  maintains  a  col¬ 
lection  of  all  its  instances.  The  VPFDatabase  class  is  the  superset  of  all  VPF  data,  and 
has  a  class  variable  or  a  global  dictionary-  called  Databases  that  contains  all  instances  of 
the  VPFDatabase  class.  A  root  entry  to  any  feature  access  begins  with  the  Databases 
dictionary.  Likewise,  other  classes  such  as  ShapeDatabase  has  a  global  variable  called 
Databases.  The  MMDatabase  class,  however,  does  not  have  a  global  variable  that 
maintains  a  collection  of  all  its  instances.  This  is  because  the  other  database  classes 
have  a  complex  data  structure  such  as  the  complex  hierarchical  grouping  as  well  as 
metadata  information.  This  can  be  seen  in  Figure  17.4. 

For  MMDatabase  instances,  however,  each  instance  is  a  multimedia  object  such  as  a 
JPEG  object.  Thus,  rather  than  maintaining  a  global  variable  called  Database  to  hold  all 
the  instances,  another  global  variable  is  used,  MMmanager. 

The  MMmanager  is  an  instance  of  VPFSpatialDataManager  class.  The  GIDS  has 
implemented  a  quadtree  based  on  a  minimum-bounding  rectangle  to  index  all  spatial 
objects.  This  is  based  on  the  regular  recursive  subdivision  of  blocks  of  spatial  data  into 
four  equal-sized  cells,  or  quadrants.  Cells  are  successively  subdivided  until  some  cri¬ 
terion  is  met,  usually  either  that  each  cell  contains  homogeneous  data,  or  that  a  preset 
number  of  decomposition  iterations  have  been  performed.  Thus,  the  cells  of  a  quadtree 
are  of  a  standard  size  (in  powers  of  two)  and  are  nonoverlapping  in  terms  of  actual  rep¬ 
resentation.  This  can  be  seen  in  Figure  17.5. 


Figure  17.2  GIDS  class  hierarchy. 
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Figure  17.3  GIDS  subclass  hierarchy. 
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TWs  spatial  indexing  scheme  is  the  key  to  the  spatial  data  integration.  All  data  are 
spatially  indexed  into  a  quadtree.  Each  database  maintains  its  own  quadtree,  which 
a  ows  independent  spatial  indexing  of  each  common  data  set.  A  global  spatial  intema- 

fk°n  on!!,0c|eCtS  il’.the  0DBMS  is  achieved  by  addressing  all  the  quadtree  instances  in 
the  ODBMS.  This  is  how  the  AOI  request  is  accomplished;  the  spatial  integration 
enables  users  to  find  all  information  available  at  the  specified  AOI.  In  other  words,  if,  for 
example,  two  different  data  sets  from  different  sources  provide  information  over  at  San 
Francisco,  California,  the  GIDS  would  let  the  user  know  that  there  is  information  about 
San  Francisco,  California  from  two  different  data  sets  that  are  different  in  format  and 
contents.  This  basically  achieves  the  AOI-driven  search  capability  in  the  GIDS. 

Interface 

An  object  request  broker  (ORB)  functions  as  an  interface  to  the  server  and  client.  A 

roDn  haS  t0  hjVC  -itS  °Wn broker  arid  a  cUent  has  to  have  its  own  broker.  GemORB  is  a 
CORBA  2.0-compliant  object  request  broker  used  by  the  server.  GemORB  provides  an 
J°r  **  Smalltalk  programming  language.  GemORB  interfaces  directlv  to  the 
ODBMS.  Clients  use  VisiBroker  as  their  CORBA  2.0  compliant  brokers.  VisiBroker  pro- 
viaes  an  interface  for  the  Java  programming  language. 

GemORB  establishes  a  connection  between  the  client  broker,  VisiBroker,  to  the 
o  ject  server,  GemStone,  through  CORBA  compliant  communication.  See  the  works  by 
Thomas  Mowbray  et  al„  [Mowbray  1997a],  the  OMG  [OMG  1996],  and  Jon  Sie-el 
ISiegel  1996]  for  information  on  CORBA.  An  Interface  Definition  Language  (IDL)  file 
defines  a  correct  mapping  of  objects  between  the  client  and  the  server.  An  IDL  file  also 
defines  operations  or  methods  that  are  available  for  clients  to  invoke  on  the  server. 
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Feature  Data 


Figure  17.4  VPF  data  mode!. 


Since  GemORB  and  VisiBroker  are  based  on  CORBA,  all  the  benefits  of  interoperabil¬ 
ity  among  programming  languages  and  platforms  apply. 

An  IDL  is  essential  for  communication  between  different  programming  languages. 
An  IDL  file  must  be  created  to  allow  for  correct  mappings  of  objects  from  one  applica¬ 
tion  to  another;  it  is  the  means  by  which  potential  clients  determine  what  operations 
are  available  and  how  they  should  be  invoked.  In  our  system,  our  IDL  file  defines  all  of 
the  objects  that  are  common  to  both  client  and  server,  as  well  as  methods  that  can  be 
invoked  to  perform  certain  operations  on  the  server,  as  shown  in  Listing  17  1 

The  first  object  that  is  utilized  by  both  the  Java  client  and  the  GIDS  server  is  a  bound¬ 
ing  box  for  the  AOI,  an  instance  of  the  BoundingBox  class.  To  define  our  BoundingBox 
object,  we  use  a  struct  data  type  that  allows  related  items  to  be  grouped  together.  For 
example,  struct  Point  {float  x,y;  };  defines  a  point  to  be  made  up  of  two  float 
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values,  x  and  y.  Similarly,  our  BoundingBox  is  defined  to  be  composed  of  two  points, 
an  origin  and  a  comer:  struct  BoundingBox  {Point  origin?  Point  cor¬ 
ner;  } ; .  We  then  defined  an  interface  called  GIDBsyms,  which  contains  the  methods 
on  the  server  that  are  invoked  by  the  client.  An  interface  is  the  most  important  aspect 
of  IDL,  since  it  provides  all  the  information  needed  for  a  client  to  be  able  to  interact 
with  an  object  on  the  server. 

Note  that  the  interface  contains  the  method  names  with  their  parameters,  as  well  as 
the  data  type  of  the  returned  object.  The  most  complex  structure  defined  in  our  IDL  is 
the  struct  VectorFeature. 

The  Smalltalk  application  has  an  object  class  called  VPFFeature  from  which  our 
VectorFeature  is  derived.  The  Smalltalk  VPFFeature  class  is  more  complex.  The 
VPFFeature  objects  have  geometry,  as  well  as  attributes.  The  geometry  information 
basically  provides  the  footprint  of  the  object  in  pairs  of  latitude  and  longitude. 
Attribute  information  consists  of  meta-data  and  actual  properties  or  characteristics  of 
the  object.  The  meta-data  provides  information  about  the  data,  such  as  VPF  specifica¬ 
tion  relevant  information  as  well  as  information  like  the  originator,  security.  VPFFea¬ 
ture  objects  are  richly  attributed;  for  example,  a  DNCBuilding  has  attributes  such  as 
bfc,  which  is  building  function  code,  and  hgt,  for  the  height  of  a  building.  For  our  Inter¬ 
net  applet,  though,  only  those  attributes  that  are  needed  for  display  and  user  queries 
are  defined  as  shown  here. 

Our  IDL  contains  another  structure  that  defines  Attribute:  struct  Attribute 
{string  name ,  value ; } ; .  The  Attribute  structure  is  used  as  part  of  the  VectorFeature 
structure  and  gives  attribute  names  and  values  for  a  given  feature  instance.  For  example, 
a  given  building  may  have  an  attribute  name  "Structure  Shape  of  Roof"  with  attribute 
value  "Gabled." 

The  final  data  type  included  in  our  IDL  file  is  a  sequence ,  which  is  similar  to  a  one¬ 
dimensional  array,  but  does  not  have  a  fixed  length.  We  use  sequences  to  reduce  the 
number  of  messages  passed  from  serv er  to  client.  The  size  of  each  sequence  is  deter¬ 
mined  dynamically  on  both  the  server  and  the  client. 
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module  GIDBmodule  { 

struct  Point  '{float  x,y;};  *,v 

struct  BoundincBox  {Point  origin,  -  *  7 


...struct  Attribute  {string  name,  . .  . 

typedef  seguence<Attribute>  ft  ‘  :  •*;' 

- — .  typedef  seguence<Point>  Coordinates -•  . 


.  typedef  seguence<string>  StringColl;'’.... 


^struct  VectorFeature  {  .  ;  * 

. string  •  featname; 

long  type;  _ 

AttributeColl  attributes 
Coordinates  cords  . .  .* 

•  BoundingBox  boundingBox;  }  ;  -  -* 


s  cruet  MediaFeacure  { 

string  object  Type; 

string  description ; 

string*"' "  filename; 
BoundingBox  ’  *  boundingBox; )  ; 


typedef  sequence<VectorFeature> 
typedef  sequence<MediaFeature> 


interface  GIDSsyns  { 

StringColl  He  tumDa  cabases  For  AO  I  (in  BoundingBox 
StringColl  ReturnCovsAndFeatsForAOI  (in  BoundingBox  a3B,  in 
dbname,  in  string  libname) ; 

FeatureColl  HetumFeats  (in  StringColl  featColl,  in  aHB 
MediaColl  P.eturnMediaFeats  (in  BoundingBox  aBB)  ; 

} ; 

} 


Listing  1 7.1  !DL  used  by  CIDS. 


This  IDL  file  must  be  compiled  on  both  the  client  and  the  server.  On  the  server,  the  IDL 
is  filed,  bindings  to  objects  are  made  appropriately,  and  new  methods  are  created.  On  the 
Java  client,  the  process  is  similarly  performed  via  an  IDL  to  Java  mapping.  Objects 
defined  in  the  IDL  can  then  be  referenced  and  used  in  both  the  client  and  server  code. 

Client 

A  client  request  for  information  expects  the  object  server  to  search  and  completely 
process  the  information.  A  client  therefore  receives  fully  processed  information  that 
can  be  used  readily  Fully  processed  implies  that  the  data  are  in  a  useful  format  by  the 
clients.  Once  the  client  receives  the  requested  data,  these  data  are  cached  on  the  client 


for  performance  enhancement.  Thus,  any  functionalities  that  need  to  be  performed 
using  the  data,  such  as  spatial  query,  are  all  performed  on  the  client 

Clients  have  an  online  access  to  geospatial  data  such  as  raster  images  and  vector  fea- 
res  over  the  Internet.  These  geospatial  objects  would  be  retrieved  from  a  GemStone 
a  Commurucation  between  the  server  and  a  client  is  accomplished  using 
C°RBA-compUant  vendor  ORBs.  The  use  of  VisiBroker  on  the  client  side  and 
emORB  for  the  database  server  is  completely  transparent  to  anyone  accessing  the 
pp  et.  Figure  17.6  shows  the  basic  architecture  of  our  system.  A  Web-based  client  has 
the  capability  to  display,  select,  and  query  objects  interactively. 

!frie.Val  °f  featurf  from  *e  GIDS  is  based  on  the  concept  of  area  of  interest 
(-  01).  The  first  screen  of  the  applet  displays  a  world  map  from  which  the  user  can 
select  a  location  graphically  through  the  use  of  a  rectangle  (bounding  box),  as  shown  in 

T6  USer  als0  has  * e  °Ption  of  entering  the  coordinates  for  the  AOI  man- 

uaUy,  or  selecting  a  predetermined  region.  From  the  user  input,  a  bounding  box  of  the 
AOI  is  transmitted  from  the  applet  via  CORBA  to  the  Smalltalk  server 
The  server  responds  with  a  set  of  database  and  library  names  for  which  data  are 
available  m  that  region.  NL\IA  provides  VPF  data  in  databases,  and  each  database  con- 
ams  one  or  more  libranes.  The  user  then  selects  a  database  and  library,  resulting  in  a 

rnpnr’6'3565  ,fe3tUre  daSSeS  beinS  retumed  from  server  through  another 
CORBA  request.  Finally,  the  user  selects  the  feature  classes  of  interest  and  submits  a 

Sr?np°R  A  em  t0  bedl3played'  as  shown  “  FiS^e  17-8-  This  request  results  in  hir¬ 
er  CORBA  communication,  and  the  server  returns  a  set  of  all  of  the  features  of  the 

requested  classes  that  are  located  in  the  given  AOI.  These  features  that  are  retumed  are 
comp  ex  objects  with  both  geometric  (coordinate)  and  attribute  information.  The 
applet  can  then  display,  select,  and  query  on  the  retumed  features. 

The  underlying  motivation  for  having  a  Web-based  Java  client  access  our  00  map- 
pmg  database  is  to  give  end  users  the  ability  to  access  and  use  NIMA  data  quicklv  and 
icien  }. .  t  the  present  time,  users  of  NIMA  data  must  have  software  to  view  the 

rnVoAA  °u  OWn  COmPuter  systems'  must  obtain  the  data  primarily  on 
CD-ROM  or  other  storage  media  with  limited  Web  interaction  available  for  the  user, 
ur  Jai  a  app  et  allows  any  user  with  a  Java-enabled  Web  browser  to  access  our  GIDB 
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Figure  17.6  Basic  system  design. 


Figure  17.7  CIDS  applet  map  display. 


over  the  Internet  and  to  display  map  data  available  in  their  area  of  interest.  In  addition 
to  display  of  map  objects,  we  have  extended  the  functionality  of  the  Java  client  to 
include  simple  queries,  individual  feature  selection,  zoom  capabilities,  attribute 
queries,  geometrical  queries,  and  updates  of  attribute  values. 

After  the  selected  features  in  a  user's  AOI  have  been  returned  to  the  Java  client  and 
displayed  as  shown  in  Figure  17.9,  the  user  can  change  the  colors  of  the  features  to  dis¬ 
tinguish  between  the  feature  classes  retrieved.  A  color  key  is  shown  providing  the 
color,  feature  class,  and  number  of  those  features  in  the  given  AOI.  The  user  also  has 
the  ability  to  change  the  color  of  the  background.  Zoom  capabilities  are  provided, 
allowing  the  user  to  zoom  in,  zoom  out,  or  zoom  to  a  user-specified  area  in  the  AOI.  An 
individual  feature  may  be  selected  by  clicking  on  it  in  the  map  pane,  resulting  in  the 
display  of  that  feature's  attributes. 

Clicking  on  the  Query  button  below  the  map  pane  performs  a  simple  query.  This 
query  lists  all  of  the  features  in  the  map  pane  and  gives  the  user  access  to  each  feature's 
attribute  information.  More  advanced  queries  can  be  performed  by  clicking  on  the 
Adv.  Query  button  below  the  map  pane.  The  advanced  query  screen  allows  users  to 
display  new  feature  classes  in  the  AOI.  The  user  can  also  perform  attribute-level 
queries.  For  example,  the  user  can  highlight  all  of  the  four-lane  roads,  or  all  buildings 
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Figure  17.8  User  selection  window. 


that  function  as  government  buildings.  Users  can  also  perform  geometrical  queries, 
such  as  "find  all  buildings  that  are  greater  than  50  feet  from  the  road/'  or  "find  all 
homes  that  are  within  20  meters  of  the  Embassy." 

Update  of  feature  attributes  is  also  possible  with  the  Java  client.  For  example,  a 
newly  paved  road  could  have  its  attribute  for  surface  type  updated  from  "gravel"  to 
"concrete."  This  function  of  the  applet  must  be  password  protected  so  that  only  users 
with  authorization  can  change  data  in  the  database. 

Another  function  available  in  our  Web  applet  includes  the  ability  to  perform  Inter¬ 
net  queries  based  on  our  AOI.  A  user  can  perform  an  Internet  query  by  selecting  the 
Internet  Query  button,  and  then  selecting  "Weather,"  "News,"  "Yellow  Pages,"  or 
"Other  Maps."  For  example,  if  a  user  decides  to  find  out  the  weather  for  the  current 
AOI  and  selects  appropriately,  the  GIDS  server  will  locate  the  nearest  city  to  the  user's 
AOI  and  will  open  a  Web  page  with  that  city's  local  weather  forecast. 

Our  existing  Smalltalk  mapping  application  has  been  extended  to  the  Web  utilizing 
a  Java  interface  via  CORBA.  The  success  of  our  effort  is  exhibited  in  the  current  func¬ 
tionalities  of  our  Java  applet  on  the  Web.  We  have  several  ongoing  projects  to  improve 
our  Web  application,  including  the  display  of  raster  data.  We  are  investigating  ways  to 
move  to  a  truly  distributed  database.  Additionally,  we  want  to  give  users  the  ability  to 
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Figure  17.9  Display  of  selected  features. 


download  data  over  the  Internet  from  our  Java  interface  to  expedite  the  distribution  of 
mapping  data.  Another  extension  to  our  Java  interface  is  the  ability  to  display  the  fea¬ 
tures  in  our  map  pane  in  3D  utilizing  VRML.  We  anticipate  the  user  being  able  to  click 
on  a  Render  in  3D  button  to  obtain  a  VRML  generated  3D  model  of  the  features  in 
the  current  AOI.  The  open  standard  of  VRML  2.0  is  an  excellent  format  for  3D  model¬ 
ing  of  land  and  underwater  terrain,  natural  features,  and  man-made  features.  We  will 
generate  3D  models  using  gridded,  TIN  (Triangulated  Irregular  Network),  and  vector 
data.  Our  VRML  models  will  provide  additional  information  about  the  AOI  by 
immersing  the  viewer  into  and  allowing  interaction  with  a  virtual  world. 

Once  these  tasks  are  accomplished,  users  interested  in  a  wide  variety  of  mapping 
data  wiH  be  able  to  access  and  benefit  from  our  GIDS  over  the  Internet  from  any  plat* 
form  using  a  Java-enabled  Web  browser.  This  wfil  allow  the  functionality  of  more  pow¬ 
erful  server  machines  to  be  exhibited  on  less  capable  client  machines.  It  will  also  give 
users  faster  access  to  mapping  data.  Our  migration  to  a  Web-based  mapping  client  is  a 
revolutionary  way  of  aUowing  clients  with  modest  computing  resources  user-friendly 
access  to  state-of-the-art  mapping  data  and  software. 
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2D  GIDS  Application 


In  this  section  we  discuss  the  2D  GIDS  Application.  We  begin  with  a  discussion  of 
Urban  Warrior  and  present  the  EMMACCS  Architecture  in  Figure  17.10. 


Urban  Warrior 

In  the  post-cold-war  era  there  has  been  more  focus  on  urban  warfare.  The  presence  of 
civilians,  urban  infrastructures,  and  open  space  with  obstructed  views  are  some  of  the 
complications  present  in  urban  warfare.  In  preparation  for  potential  urban  warfare,  the 
Marine  Corps  Warfighting  Lab  (MCWL)  has  performed  an  experiment  and  demon¬ 
stration  on  how  to  conduct  combat  in  urban  settings  using  the  latest  technology.  In 
1997,  an  exercise  called  Hunter  Warrior  took  place  with  the  focus  on  fighting  smarter, 
using  less  physical  force,  and  relying  more  on  microchips  [CNN  1997]. 

Since  many  military  operations  take  place  via  a  chain  of  command,  MCWL  focused 
on  the  command  and  control  activity  within  the  Enhanced  Combat  Operations  Center 
(ECOC).  A  specific  requirement  was  imposed  in  supporting  the  experiment;  the  over¬ 
all  system  was  required  to  be  object-oriented.  MCWL  believed  that  an  object-oriented 
system  would  better  meet  the  overall  objective  in  effectively  controlling  the  urban  war¬ 
fare.  The  Integrated  Marine  Corps  Multi-Agent  Command  and  Control  System 
(EMMACCS)  consists  of  a  number  of  different  components.  A  real-time  display  was 
required  to  visualize  the  common  tactical  picture  by  officers  in  the  ECOC  as  activities 
took  place  in  the  "battle  space/'  Stanford  Research  Institute  (SRI)  developed  and  pro- 


Figure  17.10  1MMACCS  architecture. 
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vided  the  two-dimensional  (2D)  view  of  the  urban  battle  space.  Jet  Propulsion  Labora¬ 
tory  (JPL)  provided  a  backbone  (ShareNet)  for  all  communication  among  the 
IMMACCS  participants.  Common  Object  Request  Broker  Architecture  (CORBA)  was 
the  underlying  medium  that  was  used  to  exchange  information  among  different  com¬ 
ponents  of  the  IMMACCS.  California  Polytechnic  Institute  (CalPoly)  developed  and 
pro\  ided  the  intelligent  software  agents  to  assist  in  the  decision  making  process  at 
the  ECOC.  SPAWAR  s  Multi-C4I  Systems  IMMACCS  Translator  (MCS3T)  ingested  all 
track  information  from  legacy  command  and  control  systems  such  as  Joint  Maritime 
Command  Information  System  (JMCIS)  and  translated  it  into  object-oriented  format 
for  the  IMMACCS  components  to  access  and  use.  SRI's  2D  viewer  (InCon)  as  well  as 
CalPoly's  agents  required  the  urban  infrastructure  to  provide  a  visualization  of  the  bat¬ 
tle  space  as  well  as  a  capability  to  reason  about  the  surroundings. 

Both  the  intelligent  agents  and  the  display  had  two  categories  of  information:  static 
and  dynamic.  Static  information  is  geospatial  information  that  encompasses  physical 
and  built-up  environments,  man-made  structures  (e.g.,  buildings,  facilities,  and  infra¬ 
structure),  and  natural  structures  (e.g.,  topography,  vegetation,  coastlines).  Dynamic 
information  deals  with  tracking  the  movements  of  troops,  tanks,  helicopters  and  a 
company  of  marines,  and  is  based  upon  the  urban  infrastructure  in  terms  of  its  posi¬ 
tion,  mobility,  and  operation.  It  is  the  static  information  contained  in  maps  that  pro¬ 
vides  much  of  the  strategic  and  tactical  basis  for  any  military  operation.  Since  NBvLA's 
mapping  products  provide  mapping  data  in  relational  form  and  MCWl  specifically 
required  the  overall  system  to  be  object-oriented,  DMAP  provided  the  conversion  to 
object-oriented  format  through  the  GIDS.  The  GIDS  was  used  as  the  geospatial  com- 
ponent  of  the  IMMACCS  system. 

Dynamic  as  well  as  static  urban  infrastructure  objects  were  persisted  in  the  Object 
Instance  Store  (OIS)  maintained  by  ShareNet.  Ail  objects  must  be  in  the  OIS  to  be  acces¬ 
sible  by  each  system  component.  The  OIS  stores  only  the  attributes  of  urban  infra¬ 
structure  objects,  not  the  positional  information.  Since  the  InCon  2D  viewer  did  not 
support  vector  maps  for  the  infrastructure  objects,  an  image  was  used  as  a  reference 
map.  Therefore,  InCon  needed  to  query  the  GIDS  to  determine  which  objects  were 
available  in  the  area  of  interest  (AOI).  Both  the  GIDS  and  the  OIS  maintained  a  global 
identification  of  each  infrastructure  object.  When  the  GIDS  provided  the  global  identi¬ 
fication  of  the  objects  to  InCon,  InCon  then  in  turn  requested  OIS  for  other  information. 
This  two-step  query  process  was  implemented  because  the  attributes  of  the  infrastruc¬ 
ture  objects  as  provided  by  NMA  are  a  subset  of  the  attributes  defined  for  each  infra¬ 
structure  object  m  the  JMMACCS  object  model  (IOM).  The  OIS  provides  more 
information  relevant  to  the  IMMACCS  environment.  Due  to  the  imposed  requirement 
of  using  object-oriented  systems,  CORBA  readily  was  realized  among  different  sys¬ 
tems  to  create  an  integrated  system. 

The  following  list  of  GIDS  capabilities  were  provided  to  the  IMMACCS  as  a  part  of 
the  integrated  system:  r 

■  Transform  the  relational  vector  map  information  to  object-oriented  format 

■  Upload  the  urban  infrastructure  objects  to  the  ShareNet's  Object  Instance  Store 
via  CORBA  . 

■  Allow  InCon  to  perform  spatial  query  via  CORBA 

Si!1/! rrc°  Sh°WS  the  °VeraU  IMMACCS  GIDS  support  within  the 
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Additional  functionality  was  tested  during  the  Urban  Warrior.  An  online  spatial  data 
updating  took  place  from  Bethesda,  Maryland,  to  San  Diego,  California.  This  was  a  valu¬ 
able  test,  which  demonstrated  that  the  object-oriented  technology  allows  ease  of  updat¬ 
ing  complex  data,  such  as  spatial  data.  This  work  is  discussed  in  detail  in  [Chung  1999]. 


3D  GIDS 


Mapping  has  been  the  chief  means  of  geospatial  data  visualization  provided  by  tradi¬ 
tional  Geospatial  Information  Systems  (GIS).  A  GIS  can  produce  a  highly  accurate  dig¬ 
ital  map  for  a  given  area  of  interest,  using  well-recognized  symbols  to  represent  such 
features  as  mountains,  forests,  buildings  and  transportation  networks.  Although  this 
flat  view  of  the  world  provides  an  excellent  means  of  orienting  the  user  to  the  general 
nature  and  location  of  the  features  for  a  given  area,  it  fails  to  provide  the  full  experience 
that  comes  from  viewing  a  three-dimensional  (3D)  environment.  To  address  this  short¬ 
coming,  NRL's  DMAP,  in  conjunction  with  the  University  of  New  Orleans7  Computer 
Science  department,  has  investigated  the  development  of  a  3D-GIS  that  would  assist 
the  U.S.  Marine  Corps  with  mission  preparation  and  rehearsal  and  also  provide  on-site 
awareness  during  actual  field  operations  in  urban  areas. 

We  designed  our  3D-GIS  to  supplement  NIMA's  traditional  2D  digital-mapping 
output  with  a  3D  synthetic  environment  counterpart.  The  map  output  remains  useful 
for  general  orientation  and  query  functions.  The  3D  output  addresses  the  targeted 
application  area.  Instead  of  merely  applying  photo-textures  to  highly  simplified  geo¬ 
metric  shapes,  we  include  detailed,  3D,  natural  and  man-made  features  such  as  build¬ 
ings,  roads,  streetlights,  and  so  on.  We  maximize  the  user's  experience  in  this  synthetic 
environment  by  providing  for  movement  within  and  interaction  with  the  environment 
consistent  with  the  types  of  interactions  expected  of  marines  during  anticipated  urban 
operations.  We  construct  the  environment  such  that  the  user  can  walk  or  flu  across  ter¬ 
rain  and  can  walk  into  buildings  through  doorways  or  climb  through  open  windows. 
Since  we  construct  synthetic  buildings  that  conform  to  their  real  world  floor  plans, 
direct  line  of  sight  into  and  out  of  buildings  through  open  doorways  and  windows  is 
available.  Additionally,  once  inside  a  building,  the  user  can  walk  through  interior 
rooms  via  interior  doorways  and  climb  stairs  to  different  floors. 

Our  3D  synthetic  environment  is  constructed  using  an  extension  of  the  NIMA's  Vec¬ 
tor  Product  Format  (VPF)  [DoD  1996]  designed  by  DMAP  and  the  University  of  New 
Orleans.  The  extended  VPF,  referred  to  as  VPF+  [Abdelguerfi  1998],  makes  use  of  a 
nonmanifold  data  structure  for  modeling  3D  synthetic  environments.  The  data  struc¬ 
ture  uses  a  boundary  representation  (B-rep)  method.  B-rep  models  3D  objects  by 
describing  them  in  terms  of  their  bounding  entities  and  by  topologically  orienting 
them  in  a  manner  that  enables  the  distinction  between  the  object's  interior  and  exterior. 
Consistent  with  B-rep,  the  representational  scheme  of  the  proposed  data  structure 
includes  both  topologic  and  geometric  information.  The  topologic  information  encom¬ 
passes  the  adjacencies  involved  in  3D  manifold  and  nonmanifold  objects,  and  is 
described  using  a  new,  extended  Winged-Edge  data  structure.  This  data  structure 
is  referred  to  as  "Non-Manifold  3D  Winged-Edge  Topology." 
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VPF+ 

The  data  structure  relationships  of  the  Non-Manifold  3D  Winged-Edge  Topology  are 
summarized  in  the  object  model  shown  in  Figure  17.11.  References  to  geometry  are 
omitted  for  clarity. 

There  are  five  main  VPF+  primitives: 

Entity  node.  Used  to  represent  isolated  features. 

Connected  node.  Used  as  endpoints  to  define  edges. 

Edge.  An  arc  used  to  represent  linear  features  or  borders  of  faces. 

Face.  A  two-dimensional  primitive  used  to  represent  a  facet  of  a  three-dimensional 
object  such  as  the  wall  of  a  building,  or  to  represent  a  two-dimensional  area  fea¬ 
ture  such  as  a  lake. 

Eface.  Describes  a  use  of  a  face  by  an  edge. 

Inside  the  primitive  directory,  a  mandatory  Minimum  Bounding  Box  (MBB)  table 
(not  shown  in  Figure  17.11)  is  associated  with  each  edge  and  face  primitive.  Because  of 
its  simple  shape,  an  MBB  is  easier  to  handle  than  its  corresponding  primitive.  The 
primitives  shown  in  Figure  17.11,  except  for  the  eface,  have  an  optional  spatial  index. 
The  spatial  index  is  based  on  an  adaptive-grid-based  3D  binary  tree,  which  reduces 
searching  for  a  primitive  down  to  binary  search.  Due  to  its  variable  length  records,  the 
connected  node  table  has  a  mandatory  associated  variable  length  index. 

The  ring  table  identifies  the  ring  forming  the  outer  boundary  and  all  internal  rings 
of  a  face  primitive.  This  table  allows  (along  with  the  face  table)  the  extraction  of  all  of 
the  edges  that  form  the  outer  boundary  and  that  form  the  internal  rings  of  a  face  prim- 


Figure  17.11  VPF+  primitive  level  object  model. 
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itive.  The  entity  node  and  the  internal  and  external  rings  are  not  essential  to  the  under¬ 
standing  of  the  VPF+  data  structure,  and  as  such,  will  not  be  discussed  further. 

The  eface  is  a  new  structure  that  is  introduced  in  VPF+  to  resolve  some  of  the  ambi¬ 
guities  resulting  from  the  absence  of  a  fixed  number  of  faces  adjacent  to  an  edge.  Efaces 
describe  the  use  of  a  face  by  an  edge  and  allow  maintenance  of  the  adjacency  relation¬ 
ships  between  an  edge  and  zero,  one,  two,  or  more  faces  incident  to  an  edge.  This  is 
accomplished  by  linking  each  edge  to  all  faces  connected  along  the  edge  through  a  cir¬ 
cular  linked  list  of  efaces.  As  shown  in  Figure  17.12,  each  eface  in  the  list  identifies  the 
face  with  which  it  is  associated,  the  next  eface  in  the  list,  and  the  "next"  edge  about  the 
face  with  which  the  eface  is  associated.  Efaces  are  also  radially  ordered  in  the  linked  list 
in  a  clockwise  direction  about  tfcie  edge.  The  purpose  for  the  ordering  is  to  make  tra¬ 
versal  from  one  face  to  the  radially  closest  adjacent  face  a  simple  list  operation. 

Additionally,  VPF's  Connected  Node  Table  is  modified  to  allow  for  nonmanifold 
nodes.  This  requires  that  a  node  point  to  one  edge  in  each  object  connected  solely 
through  the  node  and  to  each  dangling  edge.  This  allows  the  retrieval  of  all  edges  and 
all  faces  in  each  object,  and  the  retrieval  of  all  dangling  edges  connected  to  the  non¬ 
manifold  node. 

Unlike  traditional  VPF's  Level  3  topology,  the  "universe  face"  is  absent  in  VPF-'s 
full  3D  topology’  since  VPF+  is  intended  primarily  for  3D  modeling.  Additionally, 
since  3D  modeling  is  intended,  faces  may  be  one-  or  two-sided.  A  two-sided  face,  for 
example,  might  be  used  to  represent  the  wall  of  a  building  with  one  side  used  for  the 
outside  of  the  building  and  the  other  side  for  the  inside  of  the  building.  Feature 
attribute  information  would  be  used  to  render  the  two  different  surface  textures  and 
color.  A  one-sided  face  might  be  used  to  represent  a  portion  of  a  terrain  surface.  Ori¬ 
entation  of  the  interior  and  exterior  of  3D  objects  is  organized  in  relation  to  the  normal 
vector  of  faces  forming  the  surface  boundary  of  closed  objects.  Faces  may  also  be 
embedded  within  a  3D  object. 


>  Edge  shared  by  3  faces,  using  3 
efaces 


Figure  17.12  Relationship  of  a  shared  edge  to  its  faces,  efaces,  and  next  edge  about 
each  face. 
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3D  GIDS  System  Overview 

Figure  17.13  shows  the  flow  chart  of  the  steps  taken  to  develop  the  VPF+  database  for 
the  United  States  Public  Service  Health  Hospital  in  Presidio,  California  for  the  Urban 
Warrior  project.  Flat  floor  plans  of  the  building  were  the  only  required  inputs  to  this 
process.  These  plans  provided  detailed  information  about  the  building  such  as  floor 
layouts,  dimensions,  and  means  of  entry.  One  of  the  VPF+  tools  we  have  developed  is 
an  on-screen  digitizer  that  allows  a  user  to  interface  with  scanned  floor  plans  to  extract 
3D  geometric  data  and  automate  production.  This  allows  accurate  definition  of  the 
overall  exterior  of  the  building  and  also  accurate  placement  of  interior  rooms,  win¬ 
dows,  and  doorways  by  defining  the  nodes,  edges,  and  faces  that  form  the  three- 
dimensional  structure  of  the  building.  Using  this  tool  and  scanned  floor  plans  of  the 
hospital,  3D  data  were  gathered  for  the  building  and  populated  the  VPF+  database. 

An  example  of  the  result  of  this  process  is  shown  in  Figure  17.14.  Figure  17.14(a) 
illustrates  an  example  of  a  typical  2D  representation  of  the  building  as  might  be  found 
in  a  traditional  VPF  database.  Figures  17.14(b)  and  17.14(c),  on  the  other  hand,  show 
detailed  VPF+  object  models. 

An  example  of  the  user  interface  is  shown  in  Figure  17.15.  User  interaction  is  through 
a  Web  browser  equipped  with  a  3D  graphic  plug-in  such  as  Cosmo  Player,  and  an  appli¬ 
cation  applet.  Interactions  with  the  3D  virtual  world  include  the  ability  to  walk  into  the 
building  through  doorways  or  climb  through  open  windows,  to  look  through  open 
doorways  and  wrindows  either  from  outside  or  inside  the  building,  and  to  enter  rooms 
through  interior  doorways  and  climb  stairs  to  different  floors.  A  2D  map  of  the  hospital 
is  displayed  adjacent  to  the  3D  counterpart  in  order  to  provide  general  orientation. 


Figure  17.13  Flowchart  of  3D  model  generation  for  Urban  Warrior. 
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The  USPS  Hospital  is  a  complex  building  consisting  of  nine  floors  and  hundreds  of 
rooms.  To  provide  general  orientation  to  marines  inside  the  building,  the  3D  model  is 
supplemented  with  a  2D  map  of  the  building.  A  pointer  on  the  2D  map  shows  the 
user's  location  within  the  building  and  the  direction  in  which  he  or  she  is  heading.  This 
can  be  seen  in  Figure  17.16.  As  the  user  moves  between  floors,  the  2D  map  is  updated 
automatically  to  show  the  level  the  user  is  on. 

Experience  Report _ 

Through  the  experience  of  using  an  ODBMS  and  object-oriented  technology,  .we  have 
learned  that  memory  management,  input/output  (10)  processing,  data  structure 
requirements,  and  utilization  of  fundamental  object-oriented  design  all  play  a  significant 
role  in  the  overall  performance  of  the  ODBMS.  During  the  development  phase  of  our 
system,  we  encountered  multiple  instances  that  highlight  the  importance  of  giving  atten¬ 
tion  to  each  of  these  factors.  In  this  section,  we  will  provide  several  of  our  experiences 
with  these  factors  and  will  describe  what  we  did  to  improve  our  system  performance. 

Memory  management  for  data  storage  is  a  mandatory  task  because  the  storage 
requirement  for  objects  can  be  significantly  greater  compared  to  the  relational  format. 
Objects  require  more  storage  for  NIMA  formats  because  of  the  requirement  to  export 
the  updated  data  back  into  the  original  source  format  To  accomplish  this,  some  of  the 
source  information  or  the  relational  formats  are  captured  in  the  object  definition  to 
ensure  the  capability  to  export.  This  adds  to  the  increase  in  the  storage  requirement. 
We  believe  that  the  storage  increase  would  reduce  if  the  relational  information  were 


■  Three  Views  of  the  U.S.  Public  Health 

Service  Hospital  located  at  the  Presidio. 
San  Francisco,  California.  Figure  16.4(a) 
shows  a  typical  2D  representation.  Figure 
16.4(b)  shows  a  3D-object  model.  Figure 
16.4(c)  shows  a  cut-away  of  the  first  floor 
including  inside  rooms,  etc. 

Figure  16.14(a). 


Figure  17.14  The  U.S.  Public  Health  Service  Hospital  at  the  Presidio. 
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Figure  17.15  Sample  user  interface  to  the  USPS  Hospital. 


dropped  from  the  object  definition,  but  our  requirement  for  export  capability  prevents 
us  from  doing  so.  Consequently,  we  focus  on  other  ways  to  minimize  storage  require¬ 
ments.  The  most  dramatic  improvement  in  the  size  of  the  database  occurred  when  we 
did  not  predefine  the  size  of  collections.  Previously  we  would  use  "OrderedCollection 
new:  200"  with  the  thought  that  it  would  be  faster  to  populate  the  collection  if  its  size 
were  preset.  However,  the  ODBMS  would  utilize  the  space  for  the  collection  even  if 
only  a  few  objects  were  actually  placed  in  it.  When  we  changed  to  using  the  command 
"OrderedCollection  new"  instead,  the  database  size  decreased  dramatically  (over  ten¬ 
fold)  with  no  major  degradation  of  performance. 

Memory  management  in  terms  of  RAM  is  also  extremely  important  for  increased  per¬ 
formance.  While  testing  the  initialization  of  new  datasets  in  our  database,  we  noticed 
that  performance  tended  to  degrade  with  time;  the  first  few  objects  initialized  quickly, 
but  as  more  objects  were  initialized  the  longer  it  took  for  individual  object  initialization. 
We  concluded  that  this  was  due  to  an  increasing  number  of  objects  to  be  managed  by 
RAM  before  being  persisted  in  the  database  repository.  To  resolve  this  performance 
degradation,  we  decided  to  commit  or  persist  the  objects  in  the  database  repository  dur¬ 
ing  the  initialization  periodically  rather  than  all  at  once  at  the  end  of  initialization.  We 
chose  to  persist  100  objects  at  a  time,  which  eliminated  this  performance  problem. 

Because  network-based  client-server  applications  have  inherently  poorer  performance 
than  single-system  applications,  we  sought  to  fine-tune  the  application  performance  to 
make  the  network  hit  less  noticeable.  Since  IO  processing  is  the  most  expensive  processor 
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7:gurs  17.16  Tracking  a  marine  inside  the  USP5  Hospital. 


operation.  performance  in  transforming  the  relational  into  object  format  took  a  signifi- 
cant  amount  of  time.  This  performance  drawback  was  due  primarily  to  the  file  open 
and  close  operations.  Performance  improved  tremendously  when  we  revised  the  pro¬ 
cedure  to  read  all  the  contents  of  a  file  at  one  rime,  storing  the  darn  in.  memory  as  a  dic¬ 
tionary  or  collection  for  subsequent  use.  This  memory  is  released  once  the  data  have 
been  used  for  object  creation. 

In  addition  to  10  performance  tuning,  we  realized  performance  improvements  after 
targeting  three  other  areas  for  revision: 

■  Reimplementing  duplicate,  lower-level  methods  as  a  single  method  in  a  com¬ 
mon  superclass 

»  Storing  intermediate  results  in  local  variables,  rather  than  havine  reneated  mes- 
sage  sends 

»  Reducing  the  reliance  on  collections  and  dictionaries 

Because  the  application  was  incrementally  expanded  to  handle  additional  data  types, 
reuse  of  existing  code  was  not  always  optimal.  This  was  due  largely  to  the  fact  that  the 
size  of  the  application  made  it  difficult  for  developers  to  "see  the  forest  for  the  trees." 
When  the  time  was  found  to  take  a  broad-based  view  of  the  design,  it  was  found  that 
many  methods  were  significantly  overlapping  in  functionality.  Merging  these  methods 
from  the  class  hierarchy  into  one  method  at  the  superclass  level  resulted  in  significant 
performance  improvement  With  regard  to  the  second  area  of  performance  improve- 
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merit,  we  found  that  often  within  the  same  method,  message  sends  were  being  repeated 
to  compute  the  same  results  multiple  times.  Performance  improved  when  local  vari¬ 
ables  were  used  to  store  intermediate  results  rather  than  repeating  the  message  sends. 

Finally,  the  use  of  collections  and  dictionaries  was  dramatically  decreased.  A  great 
performance  degradation  was  noticed  when  a  dictionary  size  increased  from  1000  to 
1001  and  thereafter.  This  is  due  to  the  Smalltalk  hashing  functionality.  When  an  item 
has  to  be  added  to  a  full  dictionary,  the  dictionary  is  split  into  two.  A  new  dictionary  is 
creak'd  that  is  150  percent  of  the  size  of  the  original  dictionary.  The  contents  of  the  dic¬ 
tionary  are  then  copied  to  a  new  dictionary  The  performance  of  this  process  began  to 
degrade  significantly  as  the  size  of  a  dictionary  reached  over  1000.  Thus,  we  have 
implemented  a  large  aicrionary  that  basically  .maintains  a  collection  of  dictionaries  of 
size  1000.  A  new  dictionary  of  size  1000  is  created  each  time  a  new  dictionary  is  needed. 

In  our  ODBMS,  each  object  is  assigned  a  unique  object  identifier  known  as  an  object- 
oriented  pointer  (oop).  Tne  oop  can  be  used  to  access  any  object  in  the  database 
quickly.  We  make  extensive  use  of  the  oop  for  our  Web  interface.  Our  Web  interface  is 
a  Java  applet  embedded  in  an  htmi  document  and  accessible  over  the  Web.  Tne  apple: 
communicates  with  the  ODBMS  using  CORBA.  Users  of  our  Web  interface  want  to 
obtain  our  data  over  the  Web  quickly.  To  accomplish  this,  objects  in  the  ODBMS  are 
accessed  by  the  applet  through  utilization  of  the  object  oop.  In  this  manner,  we  take- 
advantage  of  the  oop  to  obtain  the  information  that  we  need  from  the  database,  thus 
quickly  providing  data  to  the  user. 


Conclusion 


In  this  chapter,  we  have  shown  how  a  Web-based  aisrributed  system  for  retrieval  and 
updating  of  mapping  objects  has  implemented  the  GIDS.  The  GIDS  architecture  relies 
heavily  upon  object  technology  and  includes  a  Smalltalk  server  application  interfaced 
to  a  GemStone  ODBMS.  Java/appiet-based  client  applications,  and  CORB.A  middle¬ 
ware  in  the  forms  of  VisiBroker  and  GemORE.  Tne  GIDS  was  the  realization  of  our  goal 
to  have  NTMA  data  available  for  electronic  information  distribution  and  updating,  and 
played  a  significant  roie  in  the  Marine  Corps'  Warranting  Lab’s  Urban  W:arrior 
Advanced  Warfighting  Experiment.  Tne  architectural  components  of  the  system 
worked  well  together;  using  Smalltalk  as  the  server  development  environment 
allowed  us  to  prototype  new  capabilities  quickly,  whiie  Java  provided  the  Web-based 
capabilities  for  the  user  interface.  CORB.A  proved  an  excellent  choice  to  serve  as  a 
bridge  between  the  two. 

A  description  and  prototype  of  a  3D  synthetic  environment  using  VPF+  was  also 
discussed.  The  3D  developments  demonstrated  how  marines  could  utilize  this  tech- 
nology  for  an  improved  situational  awareness  and  mission  planning.  Users  have  the 
ability  to  view  the  environment  in  a  more  realistic  manner.  VPF-r  is  the  vehicle  that 
allowed  the  synthetic  environment  to  be  constructed  with  topology  intact.  Further¬ 
more,  such  3D  visualization  is  Web-enabled  through  the  Web  browser  plugins.  Future 
directions  consist  of  bridging  the  gap  between  the  2D  and  3D  by  allowing  3D  render¬ 
ing  from  the  GIDS's  2D  display. 
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abstract 

Ta;  Mapping  Sciences  Section  of  the  Naval  Research 
-jocmtcry,  Summs  Space  Center,  has  realized  the  enormous 
Doeuts  ot  spanal  data  _  warssousng  and  database  intemticn 

°f  Inioriration 

Daaoase  (GOB).  An  ooject-onatad  approach  was  used  to 
oovetcp  an  ootect  mode!  that  could  be  easily  expanded  to 
inciuoe  all  geographic  data  types.  With  the  base  of  objec*- 
craat^  technology,  standards  such  as  Common  Object 
ff%*r  Brorcer  Aremtecnre  (C0R3A)  and  Virtual  Reality 
Mooeung  language  (VRML)  enabled  2-dimensional  as  we’'  as 
display  over  the  incemeL  "  ~ 

However,  in  the  process  of  developing  the  GIDB  svsrem,  tfc- 
qi~uon  of  wnat  to  do  with  all  the  data  became  an' inevitable 
cil  Dao:  car:  to  be  used  and  excioited  bv  user*  bur  what 
5  usem  do  With  all  the  data?  Is  the  avanabiiitv  of  »  mS 
mtcnnation  overwhelming  to  the  usem?  The  use  of  soatial  data 

*°  ^  sense  of  the ‘wealth  of 

m  the  iriDB  is  the  focus  of  this  pat>er.  After  een*-al 
tocusstons  of  the  topic  of  spatial  data  minrig,  we  then  present 

ts=fni°us  tor  a  fuzzy  set  model  for  soatial 

rgnonsnip  determination  with  the  objeewaieaed  model  of  the 


1.  INTRODUCTION 

SmhS'S-  Res=arch  Laboratory  with  the  University  of 
iv^ssissippi^  Tuiane  Univemity,  and  Planning  Svstems 
^  c^eve^°Pe^  3  Geospatial  infonnation  Database 
wmch  allows  the  combination  of  National  Imaaerv  and 
^PPmg  Agency  (NIMA)  data  (multiple  data  types,  e.g’.,  xyster, 

^Ctor’  text)  3  single  integrated  object-oriented  database.’ 

*ae.  GIDB  is  also  Common  Object  Request  Broker 
ni lecture  (CORBA)-coinpIiant  and  directly  accessible  via 
e  Internet  with  support  for  3D  rendering.  The  GIDB  allows 

117 


storage  of  complex  data  types  (e.g..  video  and  audio  ,  with  the 
tmmucr^  NIMA  cam  types  (raster,  vector,  and  texf.'to  enable 
—-o.-ints-es:  qu~»es  as  well  as  coasnained  aueries  (e* 
atspiay  all  buiiosngs  within  30  me—  of  the  plaza  area). 

The  smrage  of  multiple,  complex  spatial  data  tvpes  imoiies  a 
wealth  ot  avauaoie  mtonnation  to  GID3  users.  Tne  "GIDB 
supper's  bo tii  simple  and  advanced  aueries  related  to  both 
^  (posino^gecmeuy,  topology,  etc.)  and  non-matta! 
^tooute)  pmpemes  of  the  data  However,  inmrovements  to 
7~“  gr?  “parities  arc  currently  underway  in  the 

form  o:  the  adoiuon  of  a  spatial  data  minim  component 
Stana,  data  mmmg  is  the  determination  of  snatHl  or  attribute 
information  not  explicitly  stored  in  the  database.  Various 
toenmeues  exist  for  "mining"  of  the  data.  Tne  basics  of  it 
mvoive  detemunmg,  through  inference  and  other  ibeical 
me  Jiocs,  reianonsmps  among  the  H^r-t 

Spatial  data  mining  is  a  highly  desirable  addition  to  the  GIDB 
fo.  two  reasons  firs,  the  sheer  volume  of  the  data  and  th* 
compiexitv-  of  the  data  tvpes  is  overwhelming  to  new  o7na£ 
users.  Spatial  oaia  mining  techniques  can  be  used  ,  to  nresent  a 

tolored.  amplified  view  or  schema  of  the  data.  Tne  simeiitied 

senema  can  then  be  used  to  help  users  navieate  the  database 

^  can  be  used  to  aid  users  involved  in 
aavancw  analysis  of  the  data,  and  in  hiah-level  Querying  In 
"f  fse’  noiW5bvtous,  implicit  relationships  among  ‘the 'data 

*e  data  types  can  be  determined  and  made 
avatlaoie  tor  querying  from  the  user. 

S®?*  f0Cf  °f  0Ur  Work  is  on  the  apnlication  of  spatial 
data  muung  for  advanced  spatial  and  non-spatial  queries.  Non- 
^iah  or  attnoute  queries,  are  relevant  to  two-'thirds  of  the 
femilia"  of  data  contained  in  the  GIDB-vector  and  text 

^  mT JCOmpIeC  0fthe  ^  consisting 

STL  ),da£a  ^  3ttributc  (^-spatial)  Safas 

relatiomships.  Examples  of  attribute  data 

the^t  °f  a,mad>  the  putpose/use  of  a  building  or 

£ta  °f  3  U?Way-  £xamPies  of  topological 

oata  include  the  designauon  of  two  roads  that  intersect  or  two 


providing  hypertext  capabilities  directly  from  the 
Text  paragraphs  are  associated  with  positional  characteristics  in 
the  form  of  latitudinal  and  longitudinal  coordinates,  thus  giving 
the  textual  data  a  spatial  component  Text  is  used  for 
indicating,  for  example,  sailing  directions  for  a  specific  area,  or 
dangerous  marine  animals.  Multimedia  data  in  the  form  of 
video  and  audio  clips  and  images  are  used  in  conjunction  with 
the  text  to  provide  the  user  with  complete  information. 

Spatial  information,  on  the  other  hand,  takes  the  form  of  vector 
coordinates  for  the  vector  data,  and  image  frames  and 
subframes  for  the  raster  data.  As  mentioned  above,  text  dat^ 
also  has  a  spatial  component  In  the  form  of  an  indexing 
coordinate,  similar  to  that  used  in  gazetteers. 

The  examples  of  data  types  given  above  should  ccnvev  the 
complexity  of  spatial  and  non-spatial  data  available  in  the 
GIDB.  All  data  is  stored  within  an  internet-accessible  obiect- 
cnentea  datanase.  The  object-oriented  database  schema  is  used 
as  the  basis  for  darn  mining  of  information  related  to  metadata- 
level  concepts,  while  the  data  instantiations  are  used  for 
determining  spec  id  c  value-re  La  ted  data  mining  queries. 

Based  on  the  GIDB  developed  by  NRL  Code  7441,  this 
research  is  exploring  spatial  reasoning  within  a  spanai  and 
object-oriented  domain  Each  of  the  Gelds  is  still  pre- mature 
although  more  work  has  been  published  on  spatial  reasoning  m 
recent  years.  An  effort  to  incorporate  research  in  soatial  data 
mining  and  exploring  characteristics  of  object-oriented 
technology  for  spatial  data  is  the  research  domain  of  this  paper. 
In  particular,  we  focus  on  the  application  of  fuzzv  logic 
techniques  for  implementing  spanai  data  mining  for  topological 
and  directional  relationships. 

The  following  section  provides  background  informadon  on  the 
motivation  for  spatial  data  warehousing/ mining  needs  for 
military  users.  Section  3  provides  an  overview  of  spatial  data 
mining  followed  by  current  research  on  data  mining  in  object- 
oriented  databases  in  Section  4.  Since,  this  paper  will  use  GIDB 
as  a  baseline  application  for  spatial  and  object-oriented  data 
mining,  an  overview  of  the  application  is  also  provided.  Section 
5  presents  a  framework  for  fuzzy  modeling  of  spatial 
relationships.  Section  6  continues  with  the  integration  of  this 
model  with  the  GIDB  application.  Section  7  explores  the 
impact  of  spatial  access  methods  on  the  model,  and  section  8 
concludes  with  the  potential  growth  of  this  Geld,  as  well  as 
needed  research  topics  in  the  area  of  spatial  data  mining  in 
object-oriented  databases. 


2.  BACKGROUND 

With  the  explosion  in  the  amount  and  availability  of  digital 
geographic  data,  many  users  are  frustrated  in  trying  to  collect, 
compile,  and  analyze  data  in  different  formats  (e.g.,  textbook,  a 
chart,  vector  data)  and  over  different  display  means  (e.g., 
different  windows  for  different  visualization).  Users  want  to 
collect  and  display  the  resulting  information  for  easier 
compilation  and  analysis.  Military  digital  mapping  have  these 
same  needs. 

Military  users  arc  dependent  on  maps  to  plan  and  conduct  their 
operations.  To  military  users,  maps  or  mapping  data  capture  the 
real-world  entities  that  allow  them  to  execute  mission  planning, 
mission  maneuvering,  and  tactical  operation  planning.  The 


National  Imagery  and  Mapping  Agency  (NIMA)  provides  map: 
and  mapping  data  to  military  users.  NIMA  currently  has  thre^ 
digital  mapping  data  formats:  vector  data  in  Vector  Produc: 
Format  (VPF),  raster  data  in  Raster  Product  Format  (RPF),  anc 
text  data  in  Text  Product  Standard  (TPS).  When  military  users, 
such  as  marines,  have  a  need  for  all  mapping 'information 
available  over  a  certain  geographic  region  or  an  area  of  interest 
(AOI),  they  ask  NIMA  to  provide  such  information. 

Once  marines  receive  the  data  from  NIMA,  they  are  faced  with 
more  diffi culties.  Necessary'  information  may  be  in  VPF,  Rpp 
and  TPS  format  It  would  make  sense  to  have  a  system  that 
would  geo-reference  the  data  from  different  data  formats  and 
display.  Yet,  the  military  does  not  currently  have  a  system  that 
could  display  the  three  data  types  together.  Furthermore,  there 
is  not  a  system  that  could  digitally  catalog  the  data  for  easier 
and  faster  access  Later. 

This  predicament  can  be  analyzed  as  having  the  following 
components:  data ,  integration %  and  visualization  (fizure  1). 
The  dan  component  deals  with  the  differences  in  the  dan 
format.  Integration  addresses  the  issue  of  dan  access  across 
different  dan  formats.  Finally,  visualization  is  the  component 
in  which  the  information  content  of  the  integrated  dan  is 
displayed.  With  these  concepts,  terms  such  as  data  warehouse 
for  storing  dan,  integrated  database  for  allowing  seamless  dan 
access  across  different  dan  formats,  and  database  management 
systems  for  easier  and  faster  access  have  become  popular 
phrases  among  dan  users. 


data 

integration 

Vi 

Ci  uuuu 

Figure  1.  Target  areas  for  improving  usera’  interface  with 
spatial  dan 


Next,  the  question  becomes  how'  to  best  utilize  the  integrated 
data,  otherwise  known  as  data  analysis.  Common  dan  analysis 
efforts  involved  one  or  more  analysts  who  became  experts  on 
the  data,  providing  summaries  and  generating  reports.  Stored 
dan  provide  useful  information;  however,  raw  dan  that  has  had 
no  interpretation  applied  is  rarely  of  direct  benefit,  especially  in 
an  integrated  environment.  Thus,  we  must  add  another 
component  to  the  system  (figure  2). 


Figure  2.  New  target  area  for  improvement,  data  analysis, 
integrated  with  prior  system. 


The  true  value  of  data  is  predicated  on  the  ability  to  extract 
useful  information  for  decision  support  or  exploration,  and  on 
understanding  the  phenomena  governing  the  data  source.  In  an 


best  analyze  the  integrated  data,  the  following  must  be 

A  wealth  of  gored  information  lies  within  the  integrated  data. 

Potentially  more  information  could  be  gleaned  or  inferred  from 
the  stored  data  - -  m 

Possible  relationships  and  trxociarirmr  that  exist  between  and 

te  “dd  ^ 

Stored  data,  infared  information,  and  information  from 
observing  relationships  and  associations  between  and  among 
data  meet  one  objective-finding  or  discovering  mom 
information  or  knowledge. 

Data  mining  assumes  the  role  of  advanced  query  on  data. 
Relevant  data  are  first  examined  Then,  various  predicates  are 
imposed  on  the  queried  objects  to  infer  and  compile 'the 
meaning  of  the  collectively  queried  darn  In  summarv  data 
mimng  is  a  query  process  that  explores  distinct  information,  as 
well  as  intrinsic  relationships  and  associations  between  and 
among  dam. 


3.  SPATIAL  DATA  MINING  OVERVIEW 

When  a  user  is  given  a  set  of  data  and  a  task  to  perform  the 
data  is  compiled,  interpreted,  and  reasoned  within  the  context 
of  accomplishing  the  task.  It  is  unrealistic  for  usere  to  examine 
a  large  volume  of  spatial  dam  in  derail  and  extract  interesting 
knowledge  or  general  characteristics  from  spatial  databases 
Spatial  data  mining  is  a  field  of  study  that  mines  knowledge 
&om  large  amounts  of  spatial  data.  Discovering  knowledgTor 
data  mmmg  m  spatial  databases  is  the  extraction  of  interesting 
spatuti  patterns  and  features,  general  relationships  between 
spatial  and  non-spatial  data,  and  other  general  data 
characteristics  not  explicitly  stored  in  spatial  databases  H51 
Such  discovery  may  play  an  important  role  in  undemanding 
spatial  data,  capturing  intrinsic  relationships  between  spatial 
and  non-spatial  data,  presenting  data  regularity  in  a  concise 
manna,  and  reorganizing  spatial  databases  to  accommodate 
data  semantics  and  achieve  high  performance  [14], 

Geographic,  or  mapping,  data  consist  of  spatial  descriptions 
and  non-spanal  descriptions.  Each  representation  of  a  real- 
world  geographic  entity  is  known  as  a  feature.  Spatial 
descriptions  of  features  provide  geometrical  information,  such 
as  spatial  location,  perimeter  (boundary)  and  area,  and 
topological  information  such  as  adjacency,  inclusion,  etc.  Non- 
spatial  descriptions  of  spatial  data  are  considered  attributes 
Attributes  can  include  information  such  as  a  feature's  name,  and 
ancillary  ^  information  useful  for  situation  analysis.  Data 
toming,  therefore,  consists  of  using  these  two  classes  of  data  to 
mtermid  discover  more  information  that  was  not  explicitly 

Much  study  has  been  devoted  to  spatial  data  mining  Prototypes 
such  as  GeoMiner  [12]  provide  a  spatial  data  mining  concept 
*-operski  describes  three  primitives  of  spatial  data  mining- 

Spatial  characteristic  rule -  general  description  of  spatial  data 


Spatial  discriminant  rule-  general  description  of  the  features 

discriminating  or  contrasting  a  class  of  spatial  data  from  other 
class(es). 

Spatial  association  rule-  rules  which  describe  the  imnlication 
of  one  or  a  set  of  features  by  another  set 

Through  data  mining,  one  or  a  combination  of  the  three  rules 
can  be  discovered.  Stored  information  may  only  provide  finite 
and  explicit  characteristics  of  spatial  data.  However,  data 
mining  may  provide  other  characteristics  of  certain  spatial  data 
•  ^  °"  other  sPadal  ^  A  spatial  discriminant  rule  provides 
those  characteristics  that  are  distinct  from  other  spatial  data, 
e.g.,  roads  along  a  coast  tend  to  be  two-lane  roads.  Spatial 
association  rule  provides  more  compound  description  of  spatial 
data  when  certain  relationships  and  associations  can  be  made  to 
otiier  spaual  data,  e.g.,  buildings  on  highiv  elevated  areas  have 
pitched  roofs.  ' 

Several  algorithms  for  knowledge  discovery  are  used  based  on 
each  or  a  combination  of  the  three  rules.  These  mav  consist  of 
tor  example:  generalization,  clustering,  exploring  spatial 
associations,  and  using  approximation  and  aggregation  Those 
algorithms  that  discover  spatial  characteristic  rules  are 
considered  within  the  classes  of  generalization  and  clustering 
Generalization  requires  extensive  background  knowledge  A 
concept  hierarchy  is  used  to  allow  spatial  features  to  be 
generalized  (bottom-up  approach),  or  specified  (top-down 
approach).  For  spatial  data,  two  types  of  concept  hierarchies  are 
needed,  spatial  and  non-spatial  or  attribute-based.  A  spatial 
concept  hierarchy  can  be  viewed  as  a  breakdown  of  spatial 
entitle  by  earth,  continents,  countries,  providences,  state 
aU?i  ^  0311  achieved  by  utilizing  an  appropriate 

spatial  indexing  scheme,  e.g.,  quadtree,  r-tree.  An  exhaustive 
smy  on  different  spatial  indexing  schemes  is  presented  in  [1 1, 

Koperekr  listed  three  possible  approaches  for  an  attribute-based 
concept  hierarchy:  (1)  climb  the  concept  hierarchy  when 

VlUeScarCt.  Ch2ngcd  to  a  value,  (2)  remove 

attributes  when  further  generalization  is  impossible  or  too  manv 

f"  "  " (3)  mas‘  “““ 

Spatial  data  mining  involving  clustering  eliminates  the  need  for 
background  knowledge.  Based  on  the  characteristics  and  nature 
of  data,  a  representative  object  is  searched  while  clustering 
those  related  objects  together.  Several  approaches  are  presented 

PAM>  CL*RA'  CLAR-ANS,  CF  trees,  and 
UIRCH  [14].  All  these ^  approaches  use  attributes  to  characterize 
each  feature  by  its  similarities  among  other  features. 

Spatial  association  rule  discover  relationships  among 
objects.  Spatial  association  rule  discovery  by  Koperski  and  Han 
l  j  require  a  minimum  support  or  minimum  confidence.  This 
constrains  the  search  and  discovery  to  be  over  those  areas  that 
at  least  meet  this  minimum  support  or  confidence  level 
inerefore,  a  spanal  association  rule  is  of  the  form,  X->Y  (c%) 
where  X  and  Y  are  spatial  or  non-spatial  predicate  and  c%  is 
e  confidence  of  the  rule.  Spatial  predicates  may  be  daived 

SsS  g”1  *». 
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4.  OBJECT-ORIENTED  DATABASE  MINING 

An  object-oriented  data  mode!  provides  rich  data  structure  and 
semantics  more  relevant  to  complex  data,  such  as  spatial  data 
The  paradigm  has  characteristics  such  as  class  hierarchic  data 
and  behavior,  and  polymorphism.  These  characteristics  will 
augment  or  impact  data  mining  within  an  object-oriented  data 
domain.  For  example,  class  hierarchies  are  rigid SS  £ 
flexiole,  scnema  migration  ana  evolution  could  potentially 
impact  an  existing  data  mining  model  within  a  given  svstem 

As  ^ways,  a  careml  deign  of  the  dam  model  is  nEuired  before 
a  system  is  implemented 

Class  hierarchy  could  support  a  concept  hierarchy  if  class 
derimtions  are  deep.  Property  inheritance  is  assumed  within 
mass  hierarchy  consideration.  Class  membership  is  determined 
by  the  similarities  that  are  inherent  m  ah  sub-classe^  Thus, 
ooject  model  and  class  definition  need  to  be  ordered 

senously  before  developmg  an  object-oriented  data  model  for 
data  mining. 

Derivation  of  a  concept  hierarchy  from  a  class  hierarchy 
provides  information  about  objects  from  general  to  soeciuc  or 
vice-versa.  Thus,  a  class  hierarchy  tree  mav  be  used  as  a  means 
or  generalizing  an  object  to  its  super-  or  parent  class  obiet 
Object  can  be  clustered  based  cn  the  class  memberahiD.  Han 
[1~]  provides  three  characteristics  of  each  object  in  a  class 
pertaining  to  data  mining: 

•  An  object  identifier, 

•  A  set  of  attributes,  and 

•  A  xt  me*cx^  specify  the  computational  routines 
or  rules  associated  with  the  object  class. 

An  object  identifier  may  be  an  index  to  a  class  in  a  class 

hierarcny.  Indexing  schemes  are  used  to  facilitate  quick  data 
access  in  a  large  database.  Much  work  on  indexing  schema  in 
oojet-oneated  data  models  have  been  published,  e.g.  Class 
Hierarcny  mdex  (CH-index),  nested-index,  and  multi-index  pi 
An  organization  of  classes  by  some  indexing  scheme 
correponds  to  a  class  hierarchical  structure.  Thus,  a  rl.-^c 
detinmon  can  be  generalized  from  subclass  to  sups-  class  via 
the  maexrng  scheme  or  vice-versa. 

Updating  the  indexing  scheme  based  on  schema  evolution 
becomes  an  issue.  Han  implies  that  the  object  identifier  needs 
to  remain  unchanged  over  structural  reorganization  of  data 
However,  schema  evolution  does  take  place  unless  careful 
design  with  all  parameters  known  during  the  design  stage  is 

0f  311  °h^  is  <fetranined  by  its  attributes.  A  set  of 
attnbvttes  can  be  as  simple  as  an  integer  type,  or  can  be  as 
complicated  as  a  reference  to  another  object  Within  an  object- 
oriented  data  model,  there  may  exist  composite  objects-  that  is, 
objects  that  are  composed  of  other  objects,  each  of  which  urn 
be  composed  of  more  objects,  etc.  The  entire  set  of  such 
relationships  is  known  as  a  composition  hierarchy  or  web 
These  relationships  greatly  increase  the  complexity  of  an  object 
definition.  A  depth  of  information  traversal  determines  the  level 
of  data  mining  for  complex  objects.  However,  this  is  also  an 
advantage  for  object-oriented  data.  By  tapping  into  one  object, 
an  information  or  state  of  its  information  can  be  found  and 


inferred.  Inference  may  result  from  an  association 
relationship  that  exists  between  or  among  objects  by  refers  3 
Simple  traversal  through  an  object  web  could'  cau&'T 
discovery  of  an  association  rule  between  and  among  objects  £ 

An  object  also  encapsulates  its  behavior  or  functions  C“-^’ 
functions  could  be  implemented  to  support  data  mining 
reterence  to  the  data  encapsulation  advantage  for  comolev 
objexts,  an  object  identifier  could  be  found  e  a  result  of  !n 
amibute,  or  as  a  result  of  a  function  invoked  to  find  £ 
mformauon.  Certain  behavior  allows  specific  functions  to  ^ 
executed  based  on  criteria  imposed  on  the  execution  Tins 

StoWor.  ynamC  311(1  Parametric  mvocaUQn  of  some  known 

Object-oriented  design  and  data  structure  can  affect  the  nnn 
spatial  poruon  of  the  spatial  data  in  data  mining  A  c!  * 
hierarcny  with  concept  hierarchy  will  support  generalization  of 
object  descriptions.  The  state  of  an  object  could  be  radii v 
accessea  for  a  given  object  An  indexing  scheme  could  be  used 
to  expect  te  the  access.  Once  the  state  is  known  and  found,  an 
ooject  could  be  clustered  with  similar  objects.  In  addition, 
based  on  the  references  to  other  objects,  an  object  comoosition 
hierarcny  could  provide  a  window  to  infer  relationstuos  and 
associations  that  exist  between  and  among  objects.  Based  on 
parametric  invocation  of  behavior,  an  objec't  can  be  constrained 
to  benave  a  certain  way  under  certain  conditions  to  provide 

varying  knowledge  about  an  object  H 


5-  ^gMODEL  F0R  SPATIAL  RELATIONSHIP 

The  ability  to  distinguish  between  similar  spatial  relationships 
and  to  communicate  subtle  differences  in  such  relationships  is  a 
difficult  task  for  automated  systems.  An  especially  difficult  task 
is  to  determine  the  directional  relationship  between  two- 
dimensional  features.  Fuzzy  methods  associated  with  linguistic 
variables  [19]  is  the  most  promisiiig  approach  so  far  for  the  . 
problem  of  spanal  relationship  determination. 

As  an  example,  consider  the  three  scenes  pictured  in  figure  3. 

^  ^a) :t  “  unc*ear  whethe  the  statement  “A  is  west  of 
B  or  A  is  southwest  of  B”  better  describe  the  directional 
relationship  between  the  two.  In  figure  3(b),  however,  it  is 
much  less  controversial  to  state  simply  that  “A  is  west  of  BT 
Similarly,  m  figure  1(c),  most  would  agree  that  now  “A  is 
southwest  of  B.”  When  given  a  choice,  however,  manv  people 
would  choose  to  include  "hedge”  in  an  attempt  to  convey  more 
accurately  the  pictured  relationship.  For  example,  in  deoibine 
figto-e  (b),  one  might  state  that "  is  mostly  wet  of  B.’A  simila^ 
problem  occurs  in  the  description  of  topological  relationships. 
Because  human  reasoning  is  typically  qualitatively,  rathe  than 
quantitatively,  based,  people  do  not  often  care  to  know,  for 
example,  that  "86%  of  object  A  overlaps  4%  of  object  B  "  It  is 

S^rSjit' °B?nply  'u°*  °f  °b>“ A  «**!* 
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Previous  work  [5-8]  has  been  performed  in  the  arena  of  the 
definition  and  interpretation  of  fuzzy  binary  spatial 
relationships.  This  is  used  as  a  basis  of  our 'spatial  data  mining 
research.  In  particular,  we  concentrate  on  a  data  structure  that 
represents  topological  and  directional  relationships,  as  well  as 
supplementary  information  needed  for  fuzzy  query  processing 
Jbe  data  structure,  known  as  an  abstract  spatial  graph  (ASG) 
represents  a  transformation  of  2-dimensional  space  (areas!  into 
O-dimeasional  space  (points).  A  complete  set  of  ASGs  for  the 
original  relationships,  including  a  graphical  representation  and 
specific  property  sets,  was  developed  in  [7], 


First-level  topological  relationship  definitions  are  based  on  an 
extension  of  Men’s  temporal  relations  [1]  to  the  spatial 
domain.  In  this  work.  Men  showed  that  the  seven 
relationships  before ,  meets ,  overlaps ,  starts,  during ,  finishes 
and  equal,  along  with  their  inverses,  hold  as  the  complete  set 
ot  relationships  between  two  intervals.  Cobb  [7]  extended 
.these  to  two  dimensions  by  defining  a  spatial  relationship  as  a 
tuple  [r*  ry],  where  rx  is  the  one  of  Allen's  relationships  that 
represents  die  spatial  relationship  between  two  objects  in  the  x 
direction,  and  ry  is  likewise  defined  for  the  y  direction  Objects 
involved  are  assumed  to  be  enclosed  by  Minimum  Boundin' 
rectangles  (MBRs).  (VFF,  along  with  many  other  spatial  data 
formats,  provides  MBRs  as  approximate  feature  boundaries  in 
conjunction  with  detailed  boundary  representations.) 


The  construction  of  an  ASG  for  a  binary  spatial  relationship  is 
dependent  upon  these  object  sub-groups.  Each  object 
sub-group  is  represented  as  a  node  on  the  ASG.  Pictoriailv, 
ASG  s  are  represented  in  a  polar  graph  notation,  where  different 
node  representations  are  used  to  distinguish  between  the  objects 
involved  in  the  relationship.  The  origin  node  represents  the 
reference  area  of  the  relationship,  which  can  be  a  sub-group  of 
one  of  the  objects,  an  overlapping  area,  or  a  common  boundary 

An  example  of  an  ASG  and  its  corresponding  relationship  is 
shown  m  figure  2.  r 
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Figure  2.  M  [overlaps,  starts]  N  relationship  and 
ASG. 


•each  node  in  an  ASG  has  associated  area  weights  and  total 
node  weights  to  provide  support  for  fuzzy  query  processing.  ' 
XUese  weights  provide  information  concerning  the  degree  of 
participation  in  a  relationship  and  relative  direction, 
respectively,  and  are  used  to  define  fuzzy  qualifies  for  the 
query  language.  Specifically,  the  weights  are  intended  to 
support  queries  of  the  nature  "To  what  degree  is  region  A  south 
of  region  B?,"  or  "How  much  of  region  A  overlaps  region  B 
.qualitatively  speaking)  ?".  The  exact  manner  in  which  the 
weights  are  computed  can  be  found  in  [5, 7],  while  a  discussion  ' 
of  variations  on  the  weight  calculations  based  on  MBR 
refinements  is  the  focus  of  [9], 


By  assigning  ranges  of  area  weights  to  linguistic  terms  we  can 
provide  a  basis  for  processing  quenes  concerning  qualitatively 
defined  relationships.  Hie  set  given  below  is  one  example  of 
how  this  may  be  done. 


{all  (96-100%),  most  (60-95%),  some  (30-59%)  little 
(6-29%),  none  (0-5%)}  V  J’  6 

Node  weights  are  utilized  in  a  similar  manner  to  provide 
qualitative  directional  relationship  information.  The  purpose  of 
noae  weights  is  to  answer  the  extent  to  which  an  object  can  be 
considered  at  a  given  direction  in  relation  to  another  object 
Again,  ranges  are  provided  that  define  a  linguistic  set  useful  for 

query  purposes.  These  are  given  below. 

{directly  (96-100%),  mostly  (60-95%),  slightly  (30-^%'j 
somewhat  (6-29%),  not  (0-5%)}  ’ 

The  use  of  these  qualifiers  is  illustrated  in  the  following: 

•  Is  object  E  somewhat  north  of  object  A? 

•  Retrieve  an  object  directly  west  of  object  A. 

•  Does  most  of  object  A  overlap  some  of  object 


6.  INTEGRATION  OF  THE  ASG  AND  OO  DATA 
MODELS 

The  ASG  mode!  provides  a  structured  framework  upon  which 
fuzzy  spatial  queries  related  to  topology  and  direction  can  be 
resolved  Trie  full  ASG  mode!  includes  not  only  the  spatial 
data  structure  descrioed  in  the  previous  section,  bur  also  hieha’- 
level  data  structures  suitable  for  building  an  index  for  such 
fuzzy  queries,  and  a  set  of  properties  that  can  be  exploited  to 
resolve  the  queries  efficiently.  For  example,  a  transitivity  tabie 
tor  the  complete  set  of  relationships  has  been  derived  that  could 
be  used  to  quickly  determine  the  2D  relationship  between  A 

and  C,  given  the  relationships  between  A  and  B,  and  between  3 
and  C. 

The  purpose  of  this  section  is  to  show  how  the  ASG  model  can 
be  incorporated  into  the  existing  GIDB  OO  data  modeL  We 
concentrate  on  two  aspects:  (1)  integration  at  the  level  of  the 
geographic  features,  and  (2)  integration  at  the  indexing  level. 
The  first  step  is  to  show  how  the  relationships  can  be 
represented  in  the  GIDB  OO  model.  We  then  ‘discuss,  in 
general,  how  an  indexing  scheme  can  be  merged  with  the 
current  quadtree  implementation  for  query  purposes. 

Every  VPF  feature,  both  in  its  original  relational,  and  in  its 
transformed  object  format,  has  an  MBR  represented  through  a 
pair  of  coordinates,  one  for  the  lower  left  comer,  and  one  for 
the  upper  right  comer  of  the  bounding  box.  Because  the  2D 
relationships  upon  which  the  ASGs  are  formed  are  well- 
defined,  based  on  tri-valued  (<,  >,  =)  relations  of  the  bounding 
box  coordinates,  we  can  store  this  set  of  relationship 
as  a  global  object  Indexing  of  these  definitions  ha<wt  on  the 
mutually  exclusive  relationship  sets  derived  in  [diss]  (Le 
smrounded-by,  tangent,  etc.)  can  be  performed  to  ouickly 
derive  a  fuzzy  relationship  label. 

For  example,  the  inexact  relationship  partially-surrounded-by 
is  mapped  to  a  set  of  7  basic  relationships  defined  by  MBR 
bounding  box  interactions.  We  know  that  any  one  of  those  7 
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bmry  relationships  will  be  equaled  to  the  partially, 
rarrounaed-by  spanal  relationship.  At  that  point,  the  ASG  area 
weights  are  used  to  impart  qualitative  (linguistic)  refinements 
lO  the  relationship.  Figure  5  shows  how'  a  cross-indexing 
scheme  can  be  maintained  to  provide  efficient  access  for  both 
location  and  relationship-based  queries.  The  figure  show's  the 
pointers  (implemented  as  insun ce  variables)  from  the  features 
to  a  corresponding  ASG.  Obviously,  a  feature  has  a  one-to- 
many  relationship  to  ASGs,  while  each  ASG  corresponds  to 
exactly  two  features,  ine  feature,  represented  strictly  as  a 
pointer  to  an  object,  is  represented  in  the  quadtree  smucrure  to 
tacilitate  efficient  location -based  querying,  and  in  the  ASG 
collection  (indexed  by  furry  relationship  terms,  as  mentioned 
earlier)  for  potential  relationship  data  mining  operations. 


Figure  5.  Quadtr —  and  ASG  indexing  for  queries. 


Because  the  number  of  possible  binary  relationships  in  a 
database  with  N  objects  is  OCX2)  it  is  not  very  fficeiv  that  one 
would  wont  to  calculate  and  store  ail  the  relationships  at  one 
time.  It  is  more  practical  in  the  cases  of  large  spatial  databases 
to  compute  the  relationships  as  needed  based  on  queries  (e  g 
find  all  adjacent  building  stru cunts)  and  to  store  any  computed 
relationships  for  help  in  determuung  subsequent  relationships, 
inis  could  be  accomplished,  for  example,  through  the 
transitivity  table  mentioned  eariier,  or  through  MBR  filtering 
techniques  such  as  that  described  in  Clementini  [3]. 


7. 


SPATIAL  ACCESS  METHOD  IMPLICATIONS 


Various  models  for  data  mining  have  been  proposed  such  as 
DBIffiAIW/DBMINER  [12],  Holsheimer  et  aL,  [13]  and 
Matheus  [16].  A  general  architecture  proposed  by  Matheus  is 
used  in  this  paper  as  shown  in  figure  6.  This  section  will  focus 
on  the  DB  Interface  component  of  the  knowledge  discovery 
process.  J 
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In  general,  databases  store  and  maintain  persistent  data  T 
support  any  application,  the  data  would  have  to  be  fetched  frn  ° 
the  database  as  a  form  of  a  query  request  Formally^  a  punx)^ 
of  an  interface  to  a  database  is  to  facilitate  a  selection  ofs£°^ 
objects  that  meet  user  defined  constraints.  Shekar  et  al  kJt 
several  interlaces  to  DBMS  such  as  index  structure  suatial  ;«• 
and  views  [18].  7  ‘  Jom’ 

Indexing  structures  support  faster  access  and  retrieval,  thereby 
providing  an  efficient  processing  of  object  fetch.  Koperski  ?U] 
stated  that  the  DB  Interface  in  general  has  spatial  data 
structures  as  well  as  query  optimization.  Ester  [10]  indicated 
that  a  spatial  access  mechanism  (SAM)  provides  efficient 
processing  of  a  selection  of  objects  that  fulfill  some  conditions 
specified  by  a  user  from  the  database.  For  spatial 
has  been  gaining  research  interest  [10,  1 1, 14].  However,  Gaetie 
[11]  reminds  the  spatial  data  community  that  no  one  SAM  is 
superior  from  on  over  another.  Instead,  he  states,  “Both  time 
and  space  efficiency'  of  an  access  method  strongly  depend  on 
the  data  processed  and  the  queries  asked.” 

The  GIDB  uses  a  quadtree  as  the  interface  to  the  object, 
ones  ted  DBMS.  Quadtree  is  an  approximation  mechanism 
based  on  a  MBRs.  It  is  a  simple  and  intuitive  metiiod  of 
organizing  spanal  data  in  a  geographically  hierarchical  or 
geographically  clustering  manner.  The  principle  behind  the 
construction  at  a  quadtree  is  based  on  a  recursive  division  of 
regions  into  four  equal-sized  cells,  quadrants,  until  the  cell  that 
will  minimally  contain  the  MBR  is  found.  For  any  MBR  that 
overlaps  more  than  one  quadrant,  the  MBR  is  maintained  at  the 
parent  quadrant. 

A  SAM,  such  as  quadtree,  along  with  object-orientation 
provides  advanced  search  and  analysis  capabilities.  Properties 
such  as  data  encapsulation  and  object  nesting  allow'  spatial 
search  as  well  as  attribute  search  to  take  place.  As  a  spatial 
search  is  conducted,  spatial  relationships  can  be  computed  and 
determined  among  objects  in  a  quadtree.  As  indicated,  a 
quadtree  is  a  geographically  hi  cure  hi  cal  tree.  In  other  words, 
objects  are  clustered  spatially.  Objects  that  are  geographically 
located  in  the  northwest  corner  of  the  United  States,  for 
example,  will  be  placed  along  one  branch  of  a  quadtree 
Therefore,  retrieval  will  only  occur  along  one  branch  of  a  tree 
for  data  in  the  same  region. 

Ifl  GIDB,  a  quadtree  implementation  has  the  following  object 
structures. 


VPFSpatialDataManager  ‘manager  of  any  SAM. 
(For  GIDB,  a  quadtree  is  used) 

tapCell  *  root  of  a  tree 

mazCell  *  quadtree  creation  is  based  on  maximum 
level  of  trees 

storedGontainer  *  an  instance  of 
VFFSPatialC  ontainer 


Figure  6.  General  data  mining  architecture  by  Matheus  [16] 
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VPFSpatiaJContainer 

anribDict  *  a  collection  keyed  by  Feature  and 
Attribute  Coding  Catalogue 
features  *  a  collection  of  features  that  can  be 
minimally  contained  in  the  cell 
cell  *  a  container  can  have  at  most  four  equal¬ 
sized  cells 

asg  *  a  collection  of  asg  indices  between 
features 


VPFSpatialDataCell  •  class  definition  of 
each  quadrant  of  a  quadtree 


superCell*  a  backpointer  to  the  parent  cell 

level  *  current  level  of  a  cell 

manager  *  a  backpointer  to  VPFSpahalDataManager 

origin  *  lower  left  comer  of  a  quadrant 

comer  •  top  left  comer  of  a  quadrant 

IcwerlevelWidth  •  width  of  a  cell  in  next  level  of  a 


quadtree 

container  *  a  reference  to  an  instance  of 
VPFSpatial  Container 

id  •  unique  identification  of  a  cell  in  a 
quadtree 


VPFSpadalDataManager  and  VPFSpatialDataCell  classes 
describe  the  quadtree  structure.  VFFSpatialContainer  class 
actually  contains  the  data  Each  instance  of 
VPFSpatiaJContainer  class  contains  a  collection  of  features. 
anribDict ,  and  asg.  Features  is  a  collection  of  features  whose 
boundingBox  is  minimally  contained  by  a  celL  An  anribDict 
contains  an  attribute  indexing  scheme  based  on  the  attributes  of 
features  in  the  features  collection.  Due  to  the  data 
encapsulation  property  of  00,  each  feature  knows  all  its 
attnoute  information;  however,  an  indexing  scheme  is  used  for 
performance  reasons.  The  attribDict  is  indexed  by  the  attribute 
type.  For  each  attribute  type,  another  collection  is  maintained  if 
an  attribute  is  a  multivalued  attribute.  Otherwise,  a  collection  of 
those  features  that  contain  such  an  attribute  will  be  collected  in 
anribDict.  For  a  multi-valued  attribute,  a  collection  is 
maintained  that  uses  the  actual  value  of  the  attribute  as  the 
index  For  each  attribute  value,  a  collection  of  features  that  has 
the  attribute  and  the  attribute  value  is  maintain*^  An  asg  is  a 
collection  of  asg  values  between  feature  A  and  feature  B,  as 
described  in  the  previous  section 

If  a  user  poses  a  query  such  as  “find  all  roads  that  have  two 
lanes  that  are  close  to  government  buildings,”  two  types  of 
search  need  to  be  in  progress:  attribute -constrained  search  as 
well  as  spatial-constrained  search.  In  all  spatial  searches,  an 
assumption  is  made  that  an  area-of-interest  (AOI)  has  been 
selected.  Based  on  the  AOI,  the  search  begins  at  the  root  of  a 
quadtree  and  progresses  until  all  cells  that  intersect  the  AOI  are 
selected. 

Fust,  an  attribute-constrained  search  process  is  described.  As 
the  traversal  along  a  quadtree  continues,  an  anribDict  is  used  to 
find  all  roads  and  government  buildings.  NIMA’s  VPF  uses 
Feature  Attribute  Coding  Catalog  (FACC)  that  indicates 
whether  a  feature  is  a  road  (AP010,  AP020,  AP030)  or  a 
government  building  (AH010,  AH020,  AH050,  AH060, 
AH070).  Each  instance  of  road  also  maintains  an  attribute  LTN 


which  provides  the  track  or  lane  number.  So,  as  the  traversal  of 
the  quadtree  is  in  progress,  those  cells  that  have  an  entry  for 
LTN  attribute  as  well  as  any  FACC  of  government  buildings 
will  be  collected  into  a  collection  of  results. 

The  spatial  search  constraint  is  the  user  request  to  find  features 
that  are  close  to  feature  B.  In  order  to  build  asg  indices  for  each 
feature  combination,  a  topological  relationship  among  features 
that  meet  the  attribute  constraints  must  be  evaluated.  In  other 
words,  a  spatial  join  among  all  road  and  government  building 
features  in  the  AOI  must  be  computed  to  determine  if  any 
features  are  adjacent  Once  the  features  are  determined  to  be 
adjacent  then  an  asg  value  is  computed.  Thus,  a  spatial  search 
along  a  quadtree  must  first  compute  topological  relationship 
among  features.  Then  those  features  that  meet  the  topological 
relationship  will  be  used  to  compute  asg  values  among  the 
features. 


8.  CONCLUDING  REMARKS 

Clcariv,  spatial  data  mining  techniques  are  needed  to  enhance 
usability’  of  the  massive  amounts  of  currently  available  spatial 
data.  The  complexity  of  spatial  data,  with  inherent  composite 
relationships,  and  spatial  and  non-spatial  attributes,  ^variants 
some  way  in  which  to  both  provide  a  simplified  meta-level 
view  of  data  to  users,  as  well  as  automatic  discovery  of  rules 
and  relationships  among  the  data  Object-oriented  modeling 
techniques  are  the  accepted  paradigm  for  representing  spatial 
data,  and  data  mining  practices  fit  well  within  this  framework: 
Many  of  the  object-oriented  principles,  such  as  inheritance  and 
composition,  can  be  exploited  within  a  data  mining  context  to 
provide  powerful  inferencing  mechanisms. 

In  this  paper,  we  first  overviewed  the  principles  of  spatial  data 
mining  within  an  object-oriented  framework.  It  was  thni 
shown  how  an  existing  fuzzy  spatial  relationship  model  could 
be  integrated  with  an  object-oriented  spatial  data  schema  to 
provide  relationship  definitions  for  high-level  fuzzy  querying  of 
such  relationships  by  the  users.  This  is  the  initial  step  in 
developing  a  spatial  data  mining  framework  based  on  abstract 
spatial  graphs. 

Although  some  work  in  the  integration  of  data  mining  and 
object-oriented  modeling  has  been  performed,  the  field  is 
certainly  not  mature  as  of  this  time  More  work  in  formal 
methods  of  integration  is  warranted,  as  is  research  in  data 
mining  issues  specific  to  spatial  data  Our  plans  for  future 
work  include  an  implementation  of  the  model  described  in 

of  a  full-featured  non¬ 


section  5,  as  well  as  the  development 
spatial  attribute  data  mining  component 
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Abstract 

The  power  of  spatial  queries  for  analysis  and 
planning  purposes  in  many  different  application 
fields  has  drawn  significant  attention  within  the  GIS 
research  field.  The  extraction  of  meaningful 
information  from  spatial  data  requires  specialized 
data  structures,  query  languages  and  query  processing 
strategies. 

This  paper  is  primarily  concerned  with  the  binary 
data  structures  that  support  the  fuzzy  queries  of 
spatial  relationships  in  two  dimensions.  For 
implementation  purpose,  the  topological  relations  in 
this  model  are  refined  from  a  previously  defined 
model.  This  modified  binary  spatial  model  will 
reduce  the  burden  of  geometric  computation.  Based 
on  the  modified  binary  spatial  model,  a  CLIPS 
implementation  for  querying  binary  spatial 
relationships  is  investigated.  Details  about  the  query 
processing  strategies  are  also  provided. 

1.  Introduction 

Geographic  Information  Systems  (GIS)  is  an 
integrated  technology  that  incorporates  concept  from 
computer  graphics,  spatial  modeling  and  database 
management.  The  ability  to  perform  queries  on 
spatial  data  is  essential  to  GIS  and  related  systems. 
Due  to  the  fact  that  the  ability  to  extract  information 
for  query  results  is  dependent  on  the  underlying 
structure  of  data,  a  great  deal  of  research  efforts  have 
focused  on  the  modeling  of  spatial  data.  It  is  worth 
mentioning  that  the  work  in  [l]  provided  a  novel 
contribution  to  the  problem  of  defining  spatial 
relationships  by  considering  inferences  from 
topological  and  directional  relations. 

In  earlier  work  [1],  a  spatial  data  model  that 
represents  binary  topological  and  directional 
relationships  between  two  2-D  objects  was  presented. 
A  data  structure  called  an  Abstract  Spatial  Graph 
(ASG)  was  defined  for  the  binary  relationship  that 
maintains  all  necessary  information  regarding 
topology  and  direction.  For  complete  information  on 
this  model,  we  refer  the  reader  to  the  cited  references. 
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In  this  paper,  we  present  an  implementation  of  the 
modified  structures  based  on  the  C  Language 
Integrated  Production  System  (CLIPS).  CLIPS  is  a 
productive  development  and  delivery  expert  system 
tool  which  provides  a  complete  environment  for  the 
construction  of  rule  and/or  object-based  expert 
systems  [2,  3].  It  is  now  maintained  as  public 
domain  software.  Because  of  its  portability, 
extensibility,  capabilities,  and  low-cost,  CLIPS  has 
received  widespread  acceptance  throughout  the 
government,  industry  and  academia. 

Based  on  the  modified  sparial  relationship,  rules  are 
encoded  using  the  public-domain  CLIPS  language. 
The  CLIPS  code  is  processed  through  the  CLIPS 
expert  systems  engine  to  answer  the  topological  and 
directional  queries  for  binary  sparial  objects. 

The  paper  is  organized  as  follows.  Section  2 
describes  the  rules  of  spatial  relations  based  on  the 
binary  sparial  model,  and  investigates  a  data  structure 
improvement  for  implementation  purposes.  Section  3 
provides  details  about  CLIPS  programming  strategies 
for  query  processing.  The  query  result  section 
follows,  providing  a  sample  of  how  the 
implementation  works.  Our  conclusion  and 
directions  for  further  work  are  presented  in  section  5. 

2.  A  Binary  Spatial  Model  and  Its  Modification 
For  the  purpose  of  the  model,  we  first  assume  that 
objects  involved  can  be  enclosed  in  Minimum 
Bounding  Rectangles  (MBRs).  Figure  1  shows  two 
MBR  objects  in  2-dimensions,  i.e.,  each  object  can  be 
represented  by  a  two-point  abstraction  that  represents 
the  lower-left  and  upper-right  comers  of  the  MBR. 

A  tuple  [rx,  ry]  represents  the  relationship  between  the 
objects  in  both  the  horizontal  and  vertical  directions. 
Each  of  rx  and  ry  is  one  of  Allen’s  temporal  relations 
[4]  that  represents  the  interaction  of  the 'objects  in  the 
x  direction  and  y  direction,  respectively. 
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Figure  1.  MBRs  of  two  objects. 

2.1  Basic  Rule  Sets  of  Binary  Spatial  Relations 
Based  on  Allen’s  13  temporal  relations,  a  two- 
dimensional  spatial  data  model  was  achieved.  The 
result  is  that  85  possible  relationships  are  deduced, 
which  include  49  base  relationships  and  36  inverse 
relations.  These  relationships  are  then  used  to  define 
topological  and  directional  relationships. 

In  this  paper,  three  rule  sets  are  used  to  represent  the 
basic  structure  of  this  model.  Consider  two  objects: 

A  (Axl,  Ayl)  (Ax2,  Ay 2) 

B  (  Bxl,  Byl)  (Bx2,  By2) 

Now,  taking  one-point  from  each  object,  that  is, 

(Al,  A2)  =  (Axl,  Ax2)  or  (Ayl,  Ay2) 

(Bl,  B2)  =  (  Bxl,  Bx2)  or  (Byl,  By2), 

we  will  present  the  implementation  process. 

Rule  Set  1:  Define  a  set  of  non-ambiguous 
relationships . 

Consider  one  direction,  the  temporal  relation  between 
object  A  and  object  B  can  be  defined  as: 


I.  IF 

< 

A2<B1  > 

THEN 

< 

before  > 

IF 

< 

B2<A1  > 

THEN 

< 

before'1  > 

2.  IF 

< 

A2-B1  > 

THEN 

< 

meet  > 

IF 

< 

B2-A1  > 

THEN 

< 

meet'1  > 

3.  IF 

< 

A1<B1<A2<B2 

> 

THEN 

< 

overlap  > 

IF 

< 

B1<A1<B2<A2 

> 

THEN 

< 

overlap’1  > 

4.  IF 

< 

BKAKA2-B2 

> 

THEN 

< 

finish  > 

.  IF 

< 

A1<B1<A2«B2 

> 

THEN 

< 

finish"1  > 

5.  IF 

< 

B1<A1<A2<B2 

> 

THEN 

< 

during  > 

IF 

< 

A1<B1<B2<A2 

> 

THEN 

< 

during"1  > 

6.  IF 

< 

A1-B1<A2<B2 

> 

THEN 

< 

start  > 

IF 

< 

B1«A1<B2<A2 

> 

THEN 

< 

start’1  > 

7.  IF 

<A1-BKA2-B2 

> 

THEN 

< 

equal  > 

Figure  2.  Defining  a  set  of  non-ambiguous  relations. 


Simply,  this  rule  set  can  be  expressed  as: 

rx  =  (b,  m,  o,  f,  d,  s,  *  b\  m\  o\  f  s’), 
where  each  relationship  and  its  inverse  is  represented 
by  its  initial  letter,  e.g.,  ‘b*  -»  ‘before.* 


In  the  y-direction,  the  same  rules  can  be  applied. 
Moreover,  there  are  two  additional  rules  that  apply: 

1.  A(rI'1,ry)B=B(rx>r/1)A 

2.  A(rx'',  ry'')B=B(rx,  ry)A 

Rule  Set  2:  Define  a  set  of  topological  relationships. 

Based  on  the  eighty-five  basic  relationships,  the 
topological  relation  set  can  be  defined  as: 

T  ={disjoint,  tangent,  surrounded-by,  partially- 
surrounded,  surrounded-by,  partially-surrounds, 
overlapped-by,  overlaps,  x-subspace,  y-subspace, 
y-subspaced-by} 

Figure  3  shows  a  subset  of  the  rules  for  topological 
relationships.  [1]  provides  greater  details  on  this. 


IF  <dd>  THEN  <A  surrounded-by  B> 

IF  <oo 1 | os ’ I  of ’ >  THEN  <A  overlapped-by  B> 

IF  <3- Id" I f» | j o->  THEN  <A  x-subspace  B> 
IF  <-si-d|-f  I  —  |-o>  THEN  <A  y-subspace  B> 


Figure  3.  Topological  relationship  rule  set. 

Rule  Set  3:  Define  the  set  of  directional 
relationships . 

Directional  relationships  are  heavily  used  in  everyday 
life.  The  most  commonly  used  are  the  cardinal 
directions  and  their  refinements.  In  the  same  way  as 
previously  seen  for  topological  relationships,  the 
directional  set  can  be  defined  as: 

D  = {North,  East,  South,  West,  North-East, 
South-East,  South-West,  North-West} 

Figure  4  shows  two  of  the  rules  for  directions. 


IF 

<dd|df Ifdjdolds 

Iff  Id-lfolfsl 

f-|dd‘ |do‘ Ids’ 

1  f d ' Idf ' | f o ' l f s 1 > 

THEN 

<  A  East  B  > 

IF 

<dd 1  do | ds | f o 1 f s | db 1  dm | 

fblfmldd* |fd’ 

Idf  Iff  ’> 

THEN  <A  South  East  B  > 


Figure  4.  Two  directional  relationship  rules. 

2.2  Define  ASG  for  Fuzzy  Querying 
Three  basic  rule  sets  can  support  the  basic  binary 
spatial  querying,  i.e.  the  querying  without  specific 
degree  information.  Researchers  [5-6]  have  shown 
that  the  directional  relationships  are  fuzzy  concepts 
since  they  depend  on  human  interpretation.  In 
addition  to  supplementary  information  needed  for 
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fuzzy  query  processing,  a  data  structure,  known  as  an 
abstract  spatial  graph  (ASG),  was  also  presented  in 
previous  work  [1].  The  concept  is  based  on  the  tasks 
of  defining  reference  areas,  partitioning  MBR’s  into 
object  sub-groups,  and  assigning  each  object  sub¬ 
group  to  a  node  on  the  ASG. 
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Figure  5.  A  [overlaps,  start]  B  and  corresponding 
ASG. 
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ASG  for  Object  A  ASG  for  Object  B 


Figure  5  shows  the  geometry  of  the  ASG,  in  which 
nodes  0-3  belong  to  object  B  and  nodes  0  and  7 
represent  object  A'.  Each  node  has  associated  weights 
that  store  fuzzy  information. 

2.3  Modifying  AGS  for  CLIPS  Implementation 
The  topological  relations  have  been  found  useful  for 
increasing  the  speed  of  spatial  queries  [5].  For 
implementation  purpose,  we  analyze  the  geometric 
characteristics  of  topological  relationships.  Excepting 
the  disjoint  relation,  all  other  relations  have  a  similar 
geometry;  that  is,  the  reference  area  is  part  of  both 
objects  involved.  Thus,  the  original  topological 
relation  set  can  be  reduced  or  reclassified  to  a  binary 
topological  set: 

T  T’  =  {disjoint,  connected} 

This  new  topological  relation  set  is  used  in  the  CLIPS 
implementation. 

For  convenience  of  implementation  and  further 
investigation,  the  ASG  is  modified  by  mapping 
topological  relationships  to  9  nodes  for  both  objects. 
Figure  6  represents  the  new  ASG.  Similarly,  each 
node  has  associated  weights.  But  differently,  the 
weight  in  some  node  can  be  null  depending  on  the 
different  topological  relations.  In  this  new  data 
structure,  because  each  object  is  associated  with  its  9 
nodes,  it  is  not  necessary  to  keep  information  related 
to  whether  a  node  belongs  to  object  A  or  object  B  in 
the  implementation.  Furthermore,  it  is  a  flexible 
structure  for  fuzzy  querying. 

3.  A  CLIPS  Implementation 

In  this  section  we  show  how  CLIPS  can  be  used  to 

implement  a  the  binary  spatial  relationships  given 


Figure  6.  A  [overlaps,  start]  B  and  new  ASG 

earlier.  Considering  the  amount  of  computation 
involved  in  implementation,  we  take  advantage  of 
the  deffunction  construct  that  allows  the  addition  of 
new  functions  without  having  to  recompile  and  relink 
CLIPS.  Several  user-defined  functions  are  written  by 
using  the  CLIPS  deffunction  construct,  which  can  be 
executed  by  CLIPS  interpretively. 

As  a  rule-based  shell,  CLIPS  stores  the  knowledge  in 
rules,  which  are  logic-based  structures.  In  the 
implementation,  the  basic  three  rules  are  defined  by 
using  defrule  constructs.  They  provide  the  basic 
spatial  information  such  as,  Object  A  is  disjoint  from 
Object  B,  or  Object  A  is  West  of  Object  B.  For  fuzzy 
querying  purposes,  extra  functions  and  rules  are 
defined  that  will  support  fuzzy  querying. 

The  implementation  is  directly  dependent  upon  the 
reduced  topological  relation  set  and  modified  ASG 
mentioned  above. 

3.1  Store  All  Facts  in  CLIPS 
The  facts  are  the  critical  resources  for  the  querying. 
All  details  for  binary  spatial  relations  are  contained  in 
deftemplate  facts.  The  type  of  information  stored  in 
the  database  includes  the  positions  of  two  objects,  the 
reference  object,  non-ambiguous  relations,  and 
topological  relationship  and  directional  relationships. 
Figure  7  shows  the  information  stored  in  facts  using 
CLIPS  syntax.  The  corresponding  data  structures  are 
declared  by  using  deftemplate  syntax. 
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{object-position  (objectname  A) 

(xl  2  )  (yl  2  )  (x2  3  )  (y2  5  )  ) 

(2D-relation  {objectl  A) 

(relations  bd)  (object2  B) } 

( topological-relationship  (objectl  A) 

( t__relation  disjoint)  (object2  B)  ) 

(directional-relationship  (objectl  A) 

(d_relation  West)  (object2  B)  ) 

(nodes  (objectname  A) 

(Central  area_weight) 

(N  area_weight  direction_weight ) 

(NW  area_weight  direction  weight) 

) 


Figure  7.  Facts  stored  in  database. 

3.2  Representing  2-D  Relation  in  CLIPS 
To  represent  2-D  temporal  relations  extended  from 
Allen’s  relations,  the  deffunction  construct  in  CLIPS 
is  utilized.  With  this  construct,  a  new  function  that 
implements  Allen’s  relations  in  1-D  is  defined 
directly  in  CLIPS.  Figure  8  shows  the  deffunction  for 
Allen’s  internal  relations.  The  knowledge  of  the  rules 
implemented  in  step  one  that  define  a  set  of  non- 
ambiguous  relationships  is  built  by  the  defrule 
construct  shown  in  Figure  9. 


(deffunction  AllenRelation 

{ ?A1  ?A2  ?B1  ?B2) 

(if(<  ?A2  ?B1)  then  {bind  Trelation  b) ) 

(if  {«  ?A2  ?B1 )  then  (bind  ?relation  m) ) 
(if  (and (<  ?A1  ?B1) (<  ?B1  ?A2) (<  ?A2  ?B2)) 
then  (bind  ?relation  o)  ) 

(if  (and  (<  ?B1  ?A1)  {-  ?A2  ?B2 )  > 
then  (bind  ?relation  f ) ) 

(if  (and  (<  ?B1  ?A1)  (<  ?A2  ?B2)  ) 

then  (bind  ?relatior  d> ) 

(if  (and  (-  ?B1  ?A1 )  (<  ?A2  ?B2) ) 
then  (bind  ?relation  s)) 

(if  (and  (-  ?B1  ?A1)  (-  ?A2  ?B2)  ) 

then  (bind  ?relation  «)) 
return  ?relatior. 

) 


Figure  8.  Deffunction  for  Allen’s  interval  relations. 

The  function  AllenRelationO  develops  a  set  of 
temporal  relations  in  1-D.  It  accepts  four  arguments 
from  a  CLIPS  program.  When  it  is  called,  it  returns 
the  temporal  relation  that  can  be  used  in  the  rule  for 
the  application. 

The  defrule  collects  the  relation  facts  in  2-D  by 
calling  AllenRelation(  ),  and  then  puts  the  2-D 
relation  knowledge  into  facts. 


(defrule  define-2D-relation  ; 

?f 3  <- (object-position  (objectname  7A&A)  i 
(xl  ?Axl) (yl  ?Ayl) (x2  ?Ax2) (y2  ! 

?Ay2))  | 

?f4  <- (object-position  (objectname  ?B&B)  1 

(xl  ?Bxl ) (yl  ?Byl) (x2  ?Bx2) (y2  I 

?By2))  j 

=>  i 

(bind  ?x (AllenRelation  ?Axl  ?Ax2  ?Bxl  ! 

?Bx2))  ] 

(bind  ?y (AllenRelation  ?Ayl  ?Ay2  ?Byl  j 

?By2))  '  | 

(bind  ?r  (sym-cat  ?x_relation  | 

?y_relation) )  ‘ 

(assert  (2D-relation  (objectl  ?A  )  i 

(relations  ?r)  (object2  ?B  ))))  j 

Figure  9.  Defrule  to  implement  Rule  Set  1. 

3.3  Basic  Binary  Spatial  Querying  Using  CLIPS 
The  basic  queries  are  based  on  the  primary 
topological  set  (Rule  Set  2)  and  directional  set  (Rule 
Set  3).  In  this  kind  of  querying,  the  degree  to  which 
one  object  lies  in  a  particular  direction  with  respect  to 
a  second  object  is  not  of  concern.  Figures  10  and  1 1 
show  CLIPS  rule  structures  for  topological 
relationship  and  directional  relationship,  respectively. 


(defrule  define- topological -relation 
(relationship  (objectl  7A&A) 
(relations  ?r)  (object2  7B&-A) ) 

«> 

(if  (eq  ?r  dd) 

then  (bind  ?tr  "is  surrounded  by") ) 

(if  (numberp  (members  ?r 

(creates  oo'  os'  of ' ) ) ) 
then  (bind  ?tr  "is  overlapped  by")) 
(assert  (topologic-relationship 

(objectl  ?A) ( t-relation  ?tr) 
(obnect2  73) )  ) 


Figure  10.  Defrule  for  topological  relationship. 


(defrule  de fine-directional- relation 
(relationship  (objectl  7A&A) 
(relations  ?r)  (object2  7B&-A) ) 

«> 

(if  (numberp  (members  ?r  (creates  od  of 
sd  sf  dd  df  fd  ff  -d  -f 

ob1  om*  oo  ’  os'  _  ) ) ) 

then  (bind  ?drl  North)) 


(loop-for-count  (?count  1  8)  do 

(bind  ?dr  (nthS  ?count  (creates  ?drl 

7dr2  7dr3  _  7dr7  7dr8))) 

(if  (numberp  (members  ?dr  (creates 
North  East  ......  West  ))) 

then 

(assert  (directional- 
relationship 

(objectl  ?A) (d-relation  ?dr) 
Cobject2  ?B) )  )  )  )  ) 


Figure  11.  Defrule  for  directional  relationship. 
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3.4  Fuzzy  Querying  of  Binary  Spatial 
Relationships 

Based  on  the  new  topological  relation  set  and 
modified  ASG  data  structure,  we  define  three  rules 
and  four  functions  to  support  the  processing  of  fuzzy 
queries.  Query  processing  strategies  are  described  as 
follows: 

Stepl.  Find  the  reference  area 

Fuzzy  variable  weights  store  all  fuzzy  query 
information.  In  order  to  get  weights  for  each  node  in 
the  ASG,  a  reference  area  must  first  be  found.  Based 
on  the  fact  that  there  are  four  points  in  the  x-direction 
or  y-direction,  given  two  objects,  a  simplified 
approach  to  determine  the  reference  area  can  be 
given. 

Approach:  The  reference  area  is  also  treated  as  an 
MBR  object.  We  take  two  middle  points  among  the 
four  points  in  each  direction  as  the  reference  object 
position.  It  can  be  represented  as  R  =(Rxj  Ry,)  (Rx. 
Ry:). 

get-reference-object  Rule  and  reference  Function 

Given  two  objects,  the  get-reference-object  rule  calls 
reference  function  to  get  the  reference  object 
position.  The  reference  function  accepts  eight 
arguments  that  represent  positions  of  two  MBR 
objects,  and  finds  the  position  for  the  reference 
object.  Finally,  it  places  the  position  information  into 
the  corresponding  object-position  fact . 

Step2.  Calculate  weights 

Based  on  the  binary  topological  relations,  a  general 
method  developed  for  connected  relations  is  shown  in 
Figure  12. 


]  N_area  -  (Rx2  -  Rxj)  (Oy2  -  Ry2)  i 

|  NE_area  -  {Ox2  -  Rx2)  (Oy2  -  Ry2)  1 

i  E_area  *  (Ox2  -  Rx2)  (Ry2  -  Ryj  j 

!  SE_area  -  (Ox2  -  Rx2)  (Ry2  -  Oy2)  j 

|  S_area  -  (Rx;  -  Rxx)  (Ryi  -  Oyx)  i 

j  SW_area  -  (RXi  -  Oxi)  (Ry2  -  Oyi)  j 

i  W^area  -  (Rx:  -  Oxi)  (Ry2  -  Ryj  j 

|  NW_area  -  (Rx2  -  Ox2)  (Oy2  -  Ry2)  f 

Figure  12.  Formulas  for  area  weight  calculation. 

In  the  figure,  R  represents  the  reference  object,  and  0 
represents  the  one  of  two  objects  investigated.  By 
adding  some  constraints,  the  general  method  for 
connected  relations  can  also  be  applied  to  disjoint 
relations. 


get-weight  Rule  and  weights  Function 

Given  two  objects  and  their  reference  object,  the 
weights  function  maps  the  object  sub-group  into  9 
nodes  for  each  object,  and  calculates  the  area  weights 
and  node  weights.  The  CLIPS  program  passes  nine 
arguments  to  weights  function,  that  is,  one  for  object 
identifier,  four  for  object  position,  and  four  for 
reference  position.  The  function  asserts  area  weights 
to  the  corresponding  nodes  for  fuzzy  querying.  The 
basic  weights  function  structure  is  shown  in  figure 
13.  A  related  rule  that  activates  the  weights  function. 

r — — - - — - — — - - - ( 

(deffunction  weights  (?object  ?Oxl  ?Oyl  1 
?Ox2  ?Oy2  ?Rxl  ?Ryl  ?Rx2  ?Ry2)  i 

i 

(bind  ?Total_area  (*  (-  ?Ox2  ?Qxl)  ; 

(-  ?Oy2  ?Oyl) ) }  [  ' 

(bind  ?C_area  (/  (*  (-  ?r*2  ?rx1) 

{-  ?Ry2  ?Ryl) )  ?Total_area( )  i 

(if  (and  (<-  ?Oxl  ?Rxl)  {>-  ?Ox2  ?FU2) }  1 

then  \ 

(bind  ?N_area  (/  (*  (-  ?Rx2  ?Rxl)  I 
(-  ?Oy2  ?Ry2 ) )  ?Total_area } )  j 

else  i 

(bind  ?N_area  0);  for  disjoint  case)  i 

;get  the  node  weight  for  north  direction  ! 

(if  (>  ?N_area  0)  (  ! 

Chen  (bind  ?N_len(-  ?Oy2(/(+  ?Rv2  ] 

?Ryl ) 2 )  ) )  | 

)  > 

(if  (<  ?N_len  0)  then  (bind  ?N_len  0))  ! 

(if  (>  ?N_len  ?Longest)  J 

then  (bind  ?Longest  ?N_len)  )  i 


i  (assert  (nodes  (objectname  ?object)  ! 

i  (C  ?C_area  )  | 

!  IN  ?N"area  (/(*  ?N_area  ?N_len)  j 

]  ?Longest))  i 

] _  •••■..)  )  j 

Figure  13.  Function  to  calculate  weights. 

§tep3.  Get  qualifier  to  implement  Fuzzy  querying 

To  provide  support  for  fuzzy  query  processing,  the 
fuzzy  variable  weights  is  assigned  to  the 
corresponding  linguistic  terms  qualifier.  The  fuzzyTq 
function  defines  the  topological  qualifiers  that 
represent  the  linguistic  terms  for  area  weight. 
Similarly  the fuzzy Dq  function  defines  the  directional 
qualifiers  that  represent  the  linguistic  terms  for  node 
weight. 

The  fuzzy  set  for  topological  qualifiers  is: 

{all  (0.96  -  1),  most  (0.6  -0.95),  some  (0.3  -  0.59) 
little  (0.06  -0.29),  none  (0  -0.05  ) } 

The  fuzzy  set  for  directional  qualifiers  is: 

{directly  (0.96  -  1),  mostly  (0.6  -0.95),  somewhat 
(0.3  -  0.59),  slightly  (0.06  -0.29),  not  (0  -0.05  ) } 
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The  fuzzy-query  rule  in  figure  15  provides  the  fuzzy 
querying  information  by  calling  fuzzyTq  and  fuzzyDq 
functions. 


(defrule  fuzzy-query 
?f3  <- (nodes  (objectnane  ’A&A) 

(C  ?C_area  ) 

(N  ?N_area  ?N_len) 

(MW  ?NW_area  ?NWj.en)  ) 

=> 

(if  (neq  ?A  B  )  then  (bind  ?obj  B)  ) 
(loop-for-count  {’count  1  8)  do 

(bind  ?dir  (nthS  ?count  (creates  North 
.  North-West)  )  ) 

(bind  ?area_w  (nth$  ?count  (creates 

?N_area  ?NE_area  ?E_area 
.  .  . ?S£_area  ?NW_area) ) ) 
(bind  ?node-W  (nth$  ’count  (creates 
?N_len  ?NE— len  ?E_len 
.  .  . ?W_len  ?NW_len) ) ) 

(bind  ?tq  (fuzzyTq  ?A  ?area_w 

?dir  ?ob j ) ) 

(bind  ?dq  (fuzzyDq  ?A  ’node^w 

’dir  ?obj ) ) 

(if  (and  (ncq  ?tq  non)  (neq  ?dq  non  )) 
then 

(printout  t  "query  information"  crlf) 

)  )  ) 


FigurelS.  Fuzzy  query  rule. 

All  of  the  CLIPS  code  is  processed  through  the 
CLIPS  expert  systems  engine  to  answer  the 
topological  and  directional  queries  for  binary  spatial 
objects. 

4.  Query  Results 

Consider  two  objects: 

object  A  (1  ,  1)  (5 , 3)  and 
object  B  (4  ,  1)(8 , 7). 

When  the  define-2D-relation  rule  is  fired,  calling 
AllenRelation  (1  5  4  8)  will  return  ‘o’,  and  the 
second  calling  of  AllenRelation  (13  17)  will  return 
‘s.1  Finally,  the  relation  ‘os’  is  added  to  the  temporal- 
relation  fact. 

When  the  defme-topological-relation  rule  is  fired,  the 
topological  information  ‘Object  A  overlaps  Object  B’ 
is  displayed.  When  the  define-dircctional-relation 
rule  is  fired,  ‘Object  A  is  South  Object  B,  Object  A  is 
South  West  of  Object  B,  and  Object  A  is  West  of 
Object  B*  are  provided  for  directional  relations. 
When  the  reference  rule  is  fired,  the  reference  object 
R(4,  1)  (5,  3)  is  asserted  into  the  fact  database.  While 

the  get-weight  rule  is  firing,  area  weights  and  node 
weights  are  assigned  into  9  nodes  for  each  object. 


Finally,  the  fuzzy-query  rule  fires,  providing  the 
following  fuzzy  querying  information: 

Most  of  Object  A  is  West  of  Object  B 
Object  A  is  mostly  West  of  Object  B 
=>  Most  of  Object  A  is  mostly  West  of  Object  B 

5.  Conclusion  and  Directions  for  Further  Works 
In  this  paper,  the  capabilities  of  a  binary  spatial  data 
model  and  a  CLIPS  tool  to  support  fuzzy  topological 
and  directional  queries  have  been  shown.  The  results 
demonstrate  that  CLIPS  is  a  flexible,  powerful,  and 
intuitive  tool  that  can  be  successfully  applied  to 
spatial  database  analysis. 

Because  the  querying  involves  handling  concepts 
expressed  by  verbal  language,  such  as  direction,  area 
weights  and  node  weights,  this  kind  of  query  is 
illustrative  of  problems  that  involve  uncertainties. 
However,  in  this  implementation,  the  representation 
of  the  fuzzy  variable  weight  is  based  on  classical  set 
theory  where  the  membership  can  be  clearly  defmed 
by  a  set.  It  simply  performs  a  low  level  fuzzy  query. 

In  the  future  we  plan  to  continue  research  on  the  use 
of  CLIPS  in  spatial  data  analysis.  We.  intend  to 
investigate  also  the  use  of  FuzzyCLIPS  for  high  level 
information  queries,  in  which  the  representation  of 
weights  information  is  based  on  the  concept  of  fuzzy 
set  theory. 
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Abstract 


K  *  The  Geospatial  Information  Database  (GIDB)  is  an 
1  implementation  of  ongoing  research  in  object-oriented 
>  • '  geographic  data  modeling  at  tiie  Naval  Research 
jt  1  Laboratory 's  Mapping,  Charting  &  Geodesy  Branch.  The 
1  GIDB  has  evolved  over  the  last  rive  years  from  the  in:::::! 
If  m  emory  -  reside  n  t  application  involving  vector  rnarnne 
s;  ~ data,  to  the  current  state-ofthe-art  system  of  a  distributed 
v"  ^object-oriented  database  with  web-based  viewing 
fi]  'capabilities  for  vector .  raster ,  hypertext  and  multimedia 
Gf  data,  as  well  as  remote  updating  of  vector  data.  The  use 
geographic  data  is  becoming  pervasive  across  mans 
V;  disciplines.  At  the  same  time .  end  users  are  becoming 
£  increasingly  dependent  upon  the  web  as  a  source  of 
:  f  readily  available,  easily  accessible  information.  H  V 
^^believe  these  two  factors  necessitate  the  development  of 
KvY  *temS  caPa  '°^e  of  the  immediate  distribution  and  access 
complex  spatial  data  objects.  In  this  paper ,  we  present 
l[}e  design  strategies  and  implementation  architecture  of 


;L1.  Introduction  and  backsround 


m- 


The  Defense  Modeling  and  Simulation  Office  (DMSOj 
*J>and  the  National  Imagery  and  Mapping  Agency  (NIMA) 
sponsored  a  FY  94  pilot  project  at  the  Naval  Research 
laboratory  (NRL)  at  Stennis  Space  Center  to  produce  a 
^prototype  object-oriented  (00)  database  with  NIMAs 
S?*51.  digital  vector  mapping  prototype.  Digital  Nautical 
&Chart  (DNC).  NRL  teamed  with  the  University  of  Florida 
£i*tq_deveIop  this  prototype. 

irrvupNC  was  the  first  Vector  Product  Format  (VPF)  [5] 
ataset  implemented  by  NIMA.  .VPF  is  a  relational  file 
format  that  represents  geographic  entities  (features),  along 
their  spatial  and  non-spatiai  attributes,  through  the 
•  of  tables.  The  project  addressed  the  following  areas: 

.  ^°pologicai  support  among  coverages,  potential  for  not 
"  .  upheating  features  among  coverages,  improved  updating 
Potential,  and  increased  access  speed.  At  project 
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completion.  key  findings  reported  to  NIMA  included  the 
following: 

•  An  00  Database  with  DNC  information  content 
could  be  implemented. 

•  Feature  content  was  only  stored  once  as  compared 
to  the  repeated  storage  of  some  features  in  the 
conventional  DNC. 

•  Direct  updating  of  features  and  attributes  was 
demonstrated. 

•  An  order  of  magnitude  speedup  was  demonstrated 
in  feature  access  time. 

•  Features  and  attributes  can  easily  be  modified  via 
a  point-and-ciick  interface. 

NRL  extended  the  prototype  00  structure  in  FY  95  to 
accommodate  multiple  VPF  databases  as  well  as  two 
different  00  database  management  systems  (ODBMS) 

[12] .  This  prototype  is  called  the  Object-Oriented  Vector 
Product  Format  (OVPF),  and  represents  a  transformation 
of  NIMA  VPF  relational  databases  into  an  00  structure. 
The  OVPF  allowed  NIMA  a  rapid  look  at  the  potential 
benefits  of  00  approaches  for  allowing  Department  of 
Defense  users  to  ask  for  information  from  NIMA  that 
spans  multiple  databases  [14]. 

During  FY  96,  much  progress  on  conflation  [2],  [7], 
[S]  and  the  base  research  for  an  integrated  00  framework 

[13]  to  support  multiple  NIMA  data  types  was 
accomplished.  In  FY  97,  NRL  developed  the  first 
prototype  of  the  integrated  00  framework,  as  well  as 
developing  a  prototype  00  Digital  Nautical  Chart 
updating  system  for  NIMA.  This  system  showed  a  24:1 
speedup  over  NIMAs  current  approach.  Also,  NRL 
developed  the  initial  CORBA  interface  for  the  integrated 
framework  to  allow  improved  electronic  dissemination  of 
digital  mapping  objects  [3],  [16]. 

Marine  Corps  Warfighting  Lab  (MCWL)  funded  the 
NRL  to  develop  the  initial  Geospatial  Information 
Database  (GIDB)  during  FY  98  and  FY  99  to  support  an 
Integrated  Marine  Corps  Multi-Agent  Command  and 
Control  System  (IMMACCS).  This  initial  prototype 
demonstrated  an  ability  to  actively  manage  NIMA 
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mapping  data  in  an  object-oriented  manner  that  could  be 
interfaced  with  components  developed  by  Stanford 
Research  Institute,  California  Polytechnic  Institute,  and 
NASAs  Jet  Propulsion  Laboratory.  The  GIDB 
demonstrated  that  this  integrated  database  could  be 
viewed  and  engaged  in  both  2D  and  3D  via  a  simple 
Internet  browser.  The  GIDB  was  a  component  of  the 
IMMACCS  successfully  used  in  the  Urban  Warrior 
Advanced  Warfighting  Experiment  in  March  99. 

The  underlying  motivation  for  having  an  Internet- 
based  Java  client  access  our  00  mapping  database  is  to 
give  end  users  the  ability  to  access  and  use  NIMA  data 
quickly  and  efficiently.  Currently,  users  of  NIMA  data 
must  have  resident  on  their  own  computer  systems 
software  to  view  the  data,  and  must  obtain  the  data  on 
CD-ROM  or  other  storage  media.  However,  given 
NIMAs  role  as  the  primary  geographic  data  distributor 
for  the  Depanmem  of  Defense,  it  is  clear  that  electronic 
dissemination  and  remote  updating  of  NIMAs  digital 
products  is  highly  desirable.  To  this  end.  the  GIDB  allows 
any  user  with  a  Java-enabled  web  browser,  such  as 
Netscape  4,0.  to  access  our  database  over  the  Internet  and 
display  NIMA  map  data  available  in  their  area  of  interest. 
Additionally,  privileged  users,  known  as  data  co- 
producers,  arc  given  the  ability  to  perform  remote  updates 
on  the  data. 

This  paper  presents  the  design  and  underlying 
architecture  of  the  GIDB  as  used  in  the  IMMACCS 
project  described  above.  The  remainder  of  this  paper  is 
organized  as  follows.  Section  2  provides  a  high-level 
description  of  the  overall  architecture,  followed  by 
emphasis  on  the  distributed  aspects  of  the  architecture  in 
section  3.  Section  4  focuses  on  the  web  applet,  which  is 
the  mechanism  by  which  remote  viewing  and  updating  of 
information  is  performed.  Section  5  contains  details  of  the 
network  updating  scheme,  and  includes  a  description  of 
the  IMMACCS  updating  test.  In  section  6,  we  present  our 
concluding  remarks  and  indicate  directions  for  future 
work. 

2.  GIDB  architecture 

The  Geospatial  Information  Database  (GIDB)  system  has 
a  client  and  server  architecture.  It  is  composed  of  server, 
interface  and  client  modules.  Currently,  GIDB  has  three 
clients  as  shown  in  figure  1. 

vp£ma£T 


2.1.  Server  ^gl 

Gemstone  is  a  commercial-off-the-shelf  object  server* 
that  stores,  manipulates,  and  processes  objects  referenced^ 
by  each  client.  The  server  consists  of  two  functional^ 
modules:  storage  of  data,  and  manipulation  or  processing 
of  data.  Based  on  the  request  from  each  client,  the1* 
Gemstone  server  searches  and  retrieves  only  those  objects^ 
that  meet  the  requested  criteria.  Data  search  for  retriev^H 
is  performed  mostly  on  the  server  for  all  three  clients^ 
Gemstone  is  an  intelligent  object  server;  it  knows  its* 
objects  by  name.  Therefore,  Gemstone  maintains  its  own^ 
object  names.  An  object  can  be  searched  and  retrieved  by"^ 
specifying  its  name.  All  disk-based  systems  involve  a  1 
fetch  at  a  page  level.  Sometimes  the  exact  content  of  a 
page  may  not  be  explicitly  known  for  most  servers’*** 
However,  for  Gemstone  as  an  object-based  system,  a  * 
content  of  a  page  can  be  known  at  an  individual  object 
level.  A  processing  to  determine  what  is  on*  the  page  can 
take  place  on  the  server  rather  than  on  the  client. 

A  server  maintains  its  VPF  data  in  the  following 
manner.  Entry  points  for  all  three  clients  are  at  the 
VPFDatabase  class  level.  VPFDatabase  class  is  the 
superset  of  ail  VPF  data.  VPFDatabase  class  has  a  class 
variable  or  a  global  dictionary  called  Databases  that 
contains  all  instances  of  the  VPFDatabase  class.  A  root 
entry  to  any  feature  access  begins  with  the  Databases  of 
VPFDatabase  class. 

VPF  data  has  a  hierarchical  structure.  A  database  is 
used  to  group  a  set  of  data  that  is  used  for  a  specific 
purpose,  e.g..  Digital  Nautical  Chart  (DNC)  for 
navigation.  It  contains  a  collection  of  libraries.  A  library 
is  used  to  group  those  features  that  are  collected  at  a 
certain  scale  over  a  certain  region.  There  may  be  some 
overlap  or  complete  containment  of  one  library  to 
another.  However,  each  library  is  unique  based  on  the 
region  and  scale.  Each  library  subsequently  contains  a 
collection  of  coverages,  where  each  coverage  contains 
those  features  that  are  related  by  a  common  theme,  e.g., 
transportation  or  cultural. 

A  database,  library  and  coverage  triad,  represented  as 
VPFDatabase,  VPFLibrary,  and  VPFCoverage  classes 
uniquely  identifies  a  feature.  A  feature  is  defined  at  a 
coverage  level.  Due  to  tabular  storage  constraints,  VPF 
data  structure  groups  data  at  yet  another  layer,  the  tile. 
Each  tile  consists  of  some  geographic  extent  in  a  minute 
by  minute  or  a  degree  by  degree  manner.  Figure  2  shows 
an  example  of  a  VMAAWE  database  having  a  collection 
of  libraries  such  as  Presidio,  Oak  Knoll,  etc.  A  Monterey 
library  consists  of  coverages  or  themes  such  as 
population,  transportation,  etc. 


Figure  1.  GIDB  system  component. 
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4:5  classes. 

0*- 

E*--  A  server  is  designed  to  use  a  coverage  as  the  minimal 
Regrouping  level  for  features  or  objects.  Every  instance  of  a 
Ef  VPFCoveragc  has  an  instance  of  a  dictionary  collection 
^‘called  covQuad.  A  covQuad  maintains  all  instance:  of 
VPFSpatiaiDataManager  for  a  given  coverage.  A 
jit1  VPFSpatialDataManager  class  represents  a  spatial 
^indexing  scheme  for  organizing  spatial  data.  The  GIDB 
fcfi'system  uses  a  quadtree  spatial  indexing  scheme  to  provide 
a  hierarchical  clustering  of  data  based  on  the  ceoerarhie 
Ik  /area.  A  quadtree  recursively  divides  an  area  into 
£>,  1  quadrants,  each  of  which  is  called  a  quadcell.  A  detailed 
3;  discussion  of  a  quadtree  indexing  scheme  can  be  found 
Bp  [11].  In  the  GIDB  system  design,  a  class 
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^quadtree  indexing  scheme.  All  spatial  objects  or  features 
^are  stored  and  indexed  in  a  quadtree.  An  insertion  of  an 
^object  into  a  quadtree  is  based  on  the  bounding  box  of  the 
t object.  A  quadee!)  that  will  minimally  contain  the 
JFboundingbox  of  an  object  is  selected  to  store  the  object, 
jt-  VPF  data  has  three  types  of  features:  point,  line  and 
fcljirea  (polygon).  For  efficient  and  faster  access  and 
^'retrieval,  each  feature  type  has  a  unique  instance  of  a 
go  quadtree,  i.e.,  there  are  three  instances  of 
^VPFSpatialDataManager  class.  Therefore,  a  covQuad  has 
instances  of  VPFSpatialDataManager  keyed  by  the 
feature  type. 

^  ‘."Any  data  access  and  retrieval  begin  by  specifying  the 

^database,  library  and  coverage.  A  feature  retrieval  may 

specify  a  pan  of  an  area  or  an  area  of  interest  (AOI)  by 

^specifying  a  geographic  extent  or  the  entire  area  of  the 

^.database  and  library.  This  request  is  sent  to  the 

^“appropriate  instance  of  VPFSpatialDataManager  for 

by  actual  feature  retrieval. 

*  - 

‘.*•2.2.  Interface 

vy  Both  GemBuilder  for  Smalltalk  and  GemORJB  are 
'‘^commercial-off-the-shelf  products.  Both  components 
.  .provide  an  interface  to  the  Gemstone  object  server. 

'  „  GemORB  is  a  Common  Object  Request  Broker 
Architecture  (CORBA)  2.0-compliant  object  request 


broker.  GemBuilder  for  Smalltalk  is  an  interface  between 
the  AOI-based  client  and  Gemstone. 

GemBuilder  for  Smalltalk  maintains  its  own  object 
names  as  well.  To  establish  a  connection  between  an 
AOI-based  client  and  Gemstone,  a  naming  convention  of 
each  object  must  be  resolved.  In  other  words,  the  client 
and  server  must  have  an  agreement  on  how  to  reference 
an  object  by  name.  GemBuilder  for  Smalltalk  provides 
those  classes  that  institute  a  convention  for  referencing 
the  same  objects  between  the  AOI-based  clients  and 
Gemstone.  For  this  reason,  GemBuilder  for  Smalltalk 
requires  some  knowledge  of  the  database- design  and 
implementation;  the  level  of  required  detail  is  client 
dependent. 

GemORB  establishes  a  connection  to  the  object  server 
through  CORB  A-compiiam  communication.  For  more  on 
CORBA,  see  [9],  [10],  and  [I5j.  GemORB  provides  those 
classes  that  represent  and  implement  CORBA.  Unlike 
GemBuilder  for  Smalltalk,  a  connection  via  GemORB 
does  not  require  an  in-depth  knowledge  of  the  system 
design  and  implementation.  An  Interface  Definition 
Language  (IDL)  file  defines  a  correct  mapping  of  objects 
between  the  ciiem  and  the  server.  An  IDL  fiie  also  defines 
operations  or  methods  that  arc  available  for  clients  to 
invoke  on  the  server.  Since  GemORB  is  based  on 
CORBA,  all  the  benefits  of  interoperability  among 
programming  languages  and  platforms  appiy.  Figure  3 
shows  the  difference  between  the  GemBuilder  for 
Smalltalk  and  GemORB  based  applications. 


Figure  3.  Gemstone  object  server 
configuration. 


23.  Client 

An  AOI  client  connects  to  the  object  server  through 
GemBuilder  for  Smalltalk.  This  client  mainly  populates, 
maintains,  updates  and  exports  data.  This  client  is  tightly- 
coupled  to  the  Gemstone  design  of  data,  i.e.,  class 
definition,  class  states  and  behaviors.  A  similar  class 
definition  is  used  between  Gemstone  and  an  AOI  client; 


an  ACI  client  closely  replicates  the  object  servers  design 
of  data.  Due  to  the  data  encapsulation  property,  a 
reterence  to  an  object  implies  a  reference  to  a  self- 
contained  object.  For  those  objects  that  are  maintained 
and  managed  by  GIDB.  a  self-contained  obiect  can 
consist  of  a  large  web  of  references  to  other  obiects,  i.e., 
pointers.  Since  an  object  referenced  by  an  AOI  client  is 
self-contained,  AOI  clients  primarily  request  the  object 
server  to  search  and  return  objects.  In  most  cases,  AOI 
clients  then  process  the  data  on  the  client  side. 

A  GcmORB  based  client,  on  the  other  hand,  does  not 
reflect  server  s  design.  These  clients  minimize 
information  maintenance  and  storage  by  reiving  on  the 
object  server  to  be  a  centralized  data  storage  as  well  as  a 
centralized  processing  center.  A  GemORB  client  request 
lor  information  expects  the  object  server  to  search  and 
completely  process  information.  A  client  will  receive 
fully  processed  information  that  can  be  readiiv  used 
without  further  processing.  These  clients  exrect  an 
answer  to  a  question:  AOI  clients  expect  from  the  object 
server  those  parts  that  arc  needed  to  solve  and  derive  the 
solution.  Thus.  AOI-based  clients  .can  be  considered  as 
“lat  clients.”  because  the  implementation  details  arc 
replicated  on  the  clients,  adding  storage  reausrement. 
They  are  expected  to  process  the  information  retrieved 
from  the  object  server.  The  GemORB-based  clients, 
however,  are  considered  as  “thin  clients/*  because  the 
implementation  of  those  objects  is  not  represented  on  the 
clients;  there  is  not  much  processing  involved  on  the 
client  side.  This  paper  will  concentrate  on  the  clients 
using  GemORB. 

3.  Distributed  architecture  background 

Information  distribution  of  updates  is  a  major  concern 
among  data  users.  This  is  especially  true  for  NIMA  users 
since  NIMA  is  the  only  authorized  data  producer  for 
military  users.  Changes  must  be  captured,  validated,  and 
then  incorporated  into  systems  that  utilize  the  data  for  any 
military  operations.  Furthermore,  it  is  essential  that 
military  users  have  the  latest  data  available,  as  well  as  a 
synchronized  view  of  the  space  by  various  and  multiple 
users. 

There  are  three  types  of  fundamental  changes: 
geometrical,  topological,  and  attribute.  However,  a 
geometrical  change  implies  a  topological  change;  a 
topological  change  cannot  take  place  unless  there'  is  a 
geometrical  change.  Thus,  this  paper  considers  only  two 
types  of  basic  changes:  geometrical  and  attribute. 

There  are  two  concerns  in  maintaining  vector  data 
when  updates  take  place:  (1)  maintenance  of  internal 
topology  and  (2)  distribution  of  updates  to  the  users. 
NIMA  produces  Vector  Product  Format  (VPF)  data. 
These  vector  data  have  an  internal  topology  called 
winged-edge  topology  [5].  This  topology  provides 
adjacency  and  contiguity  information  at  a  primitive  level, 


where  one  or  more  primitives  define  a  feature.  Five  types 
of  primitives  are  used  in  defining  a  VPF  feature:  entity 
nodes  for  point  features,  connected  nodes  for  line  and 
area  features,  edges  for  line  and  area  features,  faces  for 
area  features,  and  rings  for  area  features.  Therefore,  when 
a  geometrical  change  occurs,  there  is  a  possibility  that  the 
underlying  topology  may  change  for  neighboring  objects 
This  leads  into  the  next  relevant  issue  of  distribution 
When  a  feature  changes,  its  neighboring  feature’s 
topological  relationships  may  also  change.  Thus 
determining  what  and  how  much  information  to  send  as 
an  update  is  an  issue. 

Most  applications  that  handle  VPF  data  understand 
each  feature  in  the  context  of  its  thematic  layer,  i.e., 
database,  library,  and  coverage.  The  underlying  relational 
structures  that  support  such  a  layer  consist  of  many  tables 
and  joins.  An  update  can  only  be  considered  at  a  coverage 
level,  although  an  update  takes  place  at  a  feature  level.  A 
coverage  may  contain  hundreds  of  features.  However, 
due  to  the  relational  format  of  the  vector  data,  an  update 
implies  potential  changes  to  an  entire  coverage.  Network 
utilization  is  non-optimai  due  to  the  redundant 

information,  i.e.,  information  that  has  not  been  changed 
will  also  be  sent  as  a  part,  of  the  update.  With  a  shift  from 
relational  to  object-oriented  paradigm  for  VPF  in  GIDB,  a 
map-entity  or  feature-level  (object-level)  update  is 
implemented  [1].  Object-level  updating  alleviates  the  two 
problems  that  were  encountered  in  updating  traditional 
VPF  relational  databases.  Only  an  object  that  has  been 
modified  will  be  distributed.  Any  change  in  topology  does 
not  have  to  be  transmitted  as  a  part  of  an  update.  The 
GIDB  server  automatically  triggers  winged-edge  topology 
checking  whenever  a  new  primitive  or  a  change  in  a 
primitive  is  detected.  Since  only  an  object  that  has  been 
updated  is  distributed,  the  minimal  information  of  only 
the  object  that  is  immediately  changed  is  distributed  to  the 
DoD  services.  Of  course,  this  implies  that  the  candidates 
for  object-oriented  feature-level  updates  require  the 
receiving  application  to  be  object-oriented,  as  well. 

In  this  scenario,  one  can  imagine  a  server  and  client 
architecture  in  which  NIMA  is  the  centralized  data  source 
and  all  DoD  services  become  clients  as  data  users. 
Another  scenario  that  can  be  considered  is  that  of  a 
distributed  system.  A  distributed  system  is  composed  of 
systems  that  logically  belong  together,  but  that  are  in 
different  physical  locations.  A  distributed  system  is  ideal 
when  data  sharing  is  required  while  some  local  control  is 
maintained  [6].  In  the  updating  design,  a  client  has  the 
control  to  accept  the  update  or  reject  the  update  from  the 
server.  Each  client  to  the  NIMA  server  has  captured  a 
“mini-world”’  or  a  geographic  portion  of  the  world.  A 
combination  of  clients  to  the  NIMA  server  could  possibly 
become  the  distributed  system,  each  providing  other 
applications  and  systems  with  the  latest  data  from  NIMA 
covering  certain  geographic  regions.  In  implementing  this 
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distributed  system.  CORBA  communication  network  has 
been  used.  This  is  shown  in  figure  4. 


Figure  4.  Distributed  architecture  example. 


Based  on  figure  I.  a  web-based  client  would  be  a  client 
only,  wnercas  the  update  client  could  be  both  client  and 
server  Both  clients  use  CORBA  as  the  communication 
network.  CORBA  enables  simple  querv  and  data 
transmission  over  a  network. 


A  dialog  between  a  server  and  a  client  beains  when  a 
chem  specifies  an  AOI.  This  specification  is  currently 
made  by  providing  latitude,  longitude  and  a  radius 
Currently,  an  interactive  AOI  selection  directly  from  a 
map  is  disabled.  This  interface  is  shown  in  fisure  6  Well- 
known  and  frequently  accessed  area  names  have  been 
provided.  The  listed  names  are  those  AOIs  that  were  used 
aurmg  the  Urban  Warrior  experiment. 


4.  Web  applet 

The  goal  of  the  project  was  to  have  online  access 
geospatial  data  such  as  raster  images  and  vector  featur: 
.over  the  Internet.  These  geospatial  objects  would  t 
retrieved  from  a  GemStone  server.  Communicatio 

COPRA  T2'  3  Cl‘Cnt  iS  accompiished  usin 

t-UKBA-comphant  vendor  ORBs.  The  use  of  VisiBroke 

on  the  client  side  and  GemORB  for  the  database  server  i 
completely  transparent  to  anyone  accessing  the  applet 
tgure  5  shows  the  basic  architecture  of  our  svstem.  / 
web-based  client  has  capabilities  to  displav,  select  ant 
query  objects  interactively. 
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Figure  5.  Basic  system  design. 


Figure  6.  Area  of  interest  screen. 

Upon  receiving  an  AOI  request,  the  server  retrieves  all 
data  sets  that  contain  information  over  the  AOI  A  use** 
Will  be  able  to  tailor  the  display  by  selecting  oniv  those 
data  sets  that  are  of  interest.  The  selection  process  is 
shown  in  Figure  7. 


,  an u 


server  returns  those  objects  that  meet  the  selection 
criteria.  These  objects  are  displayed  as  shown  in  figure  8 
Simple  queries  such  as  object  attribute  retrieval  a"s  well 


as  advanced  queries  such  as  geometrical  queries,  e.s., 
objects  within  a  distance  of  X  kilometers,  can  be 
executed. 


Figure  8.  Simple  query  screen. 


5-  Information  distribution 

o.l.  Basic  system  components  and  design 

For  information  distribution  from  a  GIDB  server  to  a 
GIDB  client,  both  the  server  and  the  client  applications 
are  identical.  A  peer-to-peer  system  configuration  for 
CORBA  has  been  implemented.  A  well-defined  set  of 
methods  in  an  IDL  file  is  used  between  svstems  to  query 
and  retrieve  objects.  Any  system  can  become  a  server  and 
client  based  on  the  needs. 

The  designation  of  server  or  client  is  based  on  the  role 
a  GIDB  system  assumes.  A  GIDB  system  can  be  a  server 
to  a  suite  of  clients  for  a  certain  type  of  dataset.  However, 
the  same  GIDB  system  can  be  a  client  to  some  other 
server  for  another  dataset.  This  capability  demonstrates  a 
"smart  client  pull"  information  flow/  The  following 
assumptions  are  made  in  using  this  ‘‘smart  client  pull”: 

•  A  server  is  up  and  running  constantly.  Clients  are 
on-line  as  needed. 

•  Both  server  and  client  maintain  a  log.  A  server 
maintains  an  update  history  log.  The  client 
maintains  a  client  history  log.  These  are 
represented  in  figure  9. 

•  A  client  initiates  an  update  check.  When  a  user 
logs  onto  the  Gemstone  server,  a  request  is  sent  to 
the  server  via  ORB-to-ORB  communication  to 
check  for  any  update.  A  check  on  whether  a  client 
needs  an  update  from  a  server’s  change  log  is 
based  on  a  timestamp  and  the  state  of  the  feature 
in  terms  of  its  location  and  attributes. 


Figure  9.  System  components  for  updating. 

This  "smart  client  pull”  allows  background  processing 
to  automatically  update  the  changes  from  the  selected 
server.  Based  on  this  assumption,  an  interactive 
processing  from  the  user  is  not  required  to  initiate  the 
update.  It  is  also  possible  to  have  no  user  interaction  for 
the  actual  update  process:  the  system  could  be  set  up  to 
automatically  update  the  changes  based  on  well-defined 
criteria.  However,  the  current  implementation  allows  a 
user  to  select  which  updates  to  perform,  if  any. 

5.2.  Update  history  log  design 

The  GIDB  application  records  all  updates  in  a  history 
log.  The  history  log  is  maintained  as  a  class  variable  to 
VP  t- Database  and  can  be  viewed  by  inspecting 
‘VPFDatabase  historyLog.'  The  format  of  the  history  log 
is  shown  in  figure  10. 


Figure  10.  History  log  format. 


When  a  feature  is  updated,  an  instance  of  a  CORBA 
VectorFeature  as  defined  in  the  IDL  file  is  created  and 
added  to  the  appropriate  feature  collection  in  the  history 
iog.  The  coverage  date/time  stamp  in  the  history  log  is 
changed  to  reflect  the  date/time  that  this  feature  was 
updated.  Thus,  the  coverage  date/time  stamp  reflects  the 
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^da'te/time  of  the  most  recent  update  that  has  occurred 
!*i thin  the  coverage. 

.  Client  history  log  design 

rvp  When  a  client  receives  updates  from  a  server,  all 
•'«  --dates  are  recorded  in  the  history  log  as  described 
f 'above-  In  so  doing,  this  client  can  then  be  a  server  to 
^another  client.  In  addition  to  recording  the  updates  in  the 
-  history  log.  a  client  also  keeps  a  record  of  the  updates  in  a 
•  jient  history  log.  The  client  history  log  is  maintained  as  a 
^class  variable  to  VPFDatabase  and  can  be  viewed  by 
/  inspecting  ‘VPFDatabase  clientHistoryLog’.  The  format 
" !0f  the  client  history  log  is  shown  in  figure  11. 


Figure  11.  Client  history  log  format. 


-•  £ 

'Tv  Tne  client  history  log  records  the  date/time  of  the  latest 
7  iupdate  for  each  coverage  from  the  server.  It  is  used  to 
r. determine  whether  any  updates  have  occurred  since  the 
-  Uast  time  the  client  was  updated  by  the  server, 
fol  L:<  • 

;|5.4.  Network  update  implementation 

KV>:-When  a  client  logs  on.  the  system  automatically  sends 
'-a'CORBA  request  to  the  server  for  a  list  of  available 
'updates.  During  the  login,  the  client  invokes  the  s^rver- 
•  side  method  getUndateLogFromServer.  This  server-side 
•method  checks  the  server  history  log  for  updates.  A  list  of 
"I strings  comprised  of  database,  library,  and  coverage 
•  'names  with  timestamps,  such  as  *db  1  -lib  1-cov  1-01/27/99 
.  13:37:37’,  is  returned  to  the  client.  The  client  code  then 
*'  compares  timestamps  from  the  returned  list  of  available 
^updates  with  timestamps  from  the  client  history  log  to 
■’determine  if  the  updates  are  needed  on  the  client.  If  the 
:  {.client  does  need  to  be  updated,  a  window  appears 
tallowing  the  user  to  select  which  updates  to  perform,  as 
shown  in  figure  12. 

•—  The  user  may  choose  to  update  all,  some,  or  none  of 
•‘'the  coverages.  The  items  selected  for  update  are  then 
added  to  the  client  history  log.  As  an  item  is  being  added 
to  the  client  history  log.  the  log  is  checked  to  determine  if 
the  coverage  has  been  updated  previously.  If  so,  the 
timestamp  for  that  coverage  is  updated,  and  the  server 
timestamp  is  replaced  with  the  previous  update 
'  'timestamp.  If  not,  the  server  timestamp  is  replaced  with 


Figure  12.  Available  updates  window. 


the  word  ‘none.’  The  timestamp  replacement  is  used  to 
prevent  the  server  from  sending  back  features  that  have 
already  been  updated.  After  the  client  history  log  is 
changed,  the  server-side  method  getFeaturesToUndatc: 
undateSelections  is  invoked.  Figure  13  shows  the 
exchange  protocol  between  the  server  and  client. 


Figure  13.  Smart  client  pul!  protocol. 


For  each  item  in  the  updateSelections  list,  the  server 
finds  the  collection  of  updated  features  for  the  selected 
coverage.  If  the  item  in  the  updateSelections  list  has 
‘none’  in  place  of  its  timestamp,  then  all  of  the  features 
for  this  coverage  are  placed  in  the  set  of  features  to  be 
updated.  Otherwise,  the  timestamp  from  the 
updateSelections  list  coverage  is  compared  to  the 
timestamp  of  each  feature.  If  the  feature  was  updated  at  a 
later  date  and  time  than  the  coverage  from  the  client,  the 
feature  is  added  to  the  set  of  features  to  be  updated.  This 
set  of  features  to  be  updated  is  then  returned  to  the  client. 

When  the  client  receives  the  set  of  features  to  be 
updated,  each  feature  in  the  set  is  updated.  If  the 
changeType  is  ADD,  then  a  new  feature  is  created  based 
on  the  parameters  of  the  VectorFeature.  Otherwise,  the 
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local  client  feature  which  matches  the  VectorFeature  to  be 
changed,  deleted,  or  moved  must  be  found  in  the  client’s 
database.  The  local  client  feature  is  found  by  using  the 
VectorFeature  featname  and  id.  The  oldAttributes  and 
oldCoords  are  then  compared  with  the  local  client  feature 
to  verify  that  the  VectorFeature  and  the  local  client 
feature  ore  indeed  the  same. 

There  are  two  potential  sources  for  conflict  in  the 
search  for  a  match.  First,  a  client  may  have  locally 
updated  the  feature.  Since  all  GIDB  systems  have  a 
capability  to  update  feature  data,  a  local  update  could 
have  potentially  taken  place.  A  local  update  has 
precedence  over  the  network  update.  Secondly,  a  feature 
can  be  uniquely  identified  by  its  database,'  library, 
coverage,  feature  class,  and  id.  NIMA  distributes  its  data 
with  an  additional  identifier,  an  edition  number.  The  latest 
edition  will  be  a  superset  of  all  changes  from  the  previous 
editions.  The  changes  from  one  edition  to  another  may 
coincide  with  the  changes  in  the  history  log.  However,  the 
changes  that  take  place  by  NIMA  and  the  changes  via 
GIDB  are  truly  an  independent  effort.  Because  the  edition 
numbers  arc  not  maintained  by  GIDB  (assumed  to  have 
the  latest  released  edition),  there  may  be  a  mismatch  in 
the  edition  of  the  server  and  client.  Therefore,  using  the 
VectorFeature  featname  and  id  may  not  uniquely  identify 
a  feature.  If  the  VectorFeature  cannot  be  verified  as  a 
match  to  a  local  client  feature,  then  the  update  for  the 
VectorFeature  will  not  occur. 

When  the  feature  has  been  validated,  the  local  client 
feature  is  then  changed,  deleted,  or  moved  based  on  the 
parameters  of  the  VectorFeature.  Recall  that  the  update 
history  log  will  be  modified  to  reflect  these  updates  from 
the  server. 

5.5.  Near-real-rime  network  update  example 

The  GIDB  network  update  capability  was  tested  during 
the  Marine  Corps  Warfighting  Labs  Urban  Warrior 
Advanced  Warfighting  Experiment  in  March  V9.  In 
preparation  for  the  update  test,  a  GIDB  server  was 
installed  at  NIMA  in  Bethesda.  MD.  A  GIDB  client  was 
installed  on  the  USS  Coronado.  On  March  8,  updates 
were  made  on  the  NIMA  server  for  features  in  the 
DNC01  database,  harbor  library.  In  the  Navigational 
coverage,  several  buoys  were  selected,  and  attribute 
values  for  one  were  changed.  In  the  Hydrography 
coverage,  attribute  changes  were  made  on  .  bottom 
characteristic  points.  On  March  10,  the  client  on  the  USS 
Coronado  received  the  updates  from  the  NIMA  server, 
and  the  updates  were  verified  on  the  client.  At  the  time  of 
these  experiments,  the  USS  Coronado  was  at  sea  in  the 
Pacific  Ocean,  and  the  CORBA  communication  occurred 
over  a  network,  via  satellite  transmissions  from  ship  to 
shore. 


A  similar  test  was  performed  on  Thursday,  March  1 1  ] 
The  purpose  of  this  test  was  to  demonstrate  that  the  client  • 
could  be  updated  in  near-real -time  with  updates  from  the*' 
NIMA  server.  Several  features  on  the  NIMA  server  were ' 
updated  from  a  DNC01  general  library  and  a  VMAP  . 
Level  2  mission  specific  dataset.  A  few  minutes  later,  the  r 
client  on  the  USS  Coronado  was  able  to  receive  and 
perform  these  same  updates.  Quantitative  measures  of  the  ’ 
update  times  were  not  taken  during  this  initial  testing  of 
the  GIDB  network  update  capability.  On  average, 
retrieval  of  the  list  of  updates  from  the  server  over  a 
network  took  about  5  seconds.  Retrieval  and  update  of 
features  took  on  average  less  than  30  seconds  per  feature, 
and  this  average  decreased  as  the  number  of  updated 
features  increased.  Performance  of  the  network  updating 
is  extremely  hard  to  measure  due  to  the  variety  of 
parameters  involved:  the  number  of  features  'being 
updated,  the  types  of  updates  to  be  performed,  the  traffic 
on  the  network,  the  load  of  the  client/server,  etc. 
However,  benchmarking  studies  will  be  performed  in  the 
near  future. 

6.  Concluding  remarks  &  future  work 

In  this  paper,  we  have  shown  how  a  web-based 
distributed  system  for  retrieval  and  updating  of  mapping 
objects  was  implemented  in  the  GIDB.  The  GIDB 
architecture  relies  heavily  upon  object  technology  and 
includes:  a  Smalltalk  server  application  interfaced  to  a 
Gemstone  ODBMS,  Java/applet-based  client  applications, 
and  CORBA  middleware  in  the  forms  of  VisiBroker  and 
GemORB.  The  GIDB  was  the  realization  of  our  goal  to 
have  NIMA  data  available  for  electronic  information 
distribution  and  updating,  and  played  a  significant  role  in 
the  Marine  Corps  Warfighting  Labs  Urban  Warrior 
Advanced  Warfighting  Experiment  earlier  this  year.  The 
architectural  components  of  the  system  worked  well 
together;  using  Smalltalk  os  the  server  development 
environment  allowed  us  to  quickly  prototype  new 
capabilities,  while  Java  provided  the  web-based 
capabilities  for  the  user  interface.  CORBA  proved  an 
excellent  choice  to  serve  as  a  bridge  between  the  two. 

We  are  continuously  working  to  expand  the  scope  of 
capabilities  of  the  GIDB,  and  to  exploit  the  power  of 
object-based  technology  for  advancing  the  state-of-the-art 
in  digital  spatial  data  handling.  Currently,  we  are  working 
on  completing  the  updating  functionality  and  adding  the 
use  of  map  symbology  for  the  display.  For  the  near  future, 
we  plan  to  implement  a  rule-based  distributed  conflation 
system  for  the  GIDB  [4]  that  will  automatically  handle 
those  coses  in  which  a  single  feature  is  stored  multiple 
times. 
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Architecture  (CORBA)  and  Virtual  r  7a  a^e,  C'ornni°n  Object  Request  Broker 
the  Marine  Corps  TheGIDB  allows  th  ^  °  eIm£  LanSuage  (VRML)  technoloav  for  • 

(e.g.,  raster,  veaor  audfo 17  w  °f  ail  digital  ™PP*g  type's' 

a  simple  browser  aiid  directed  by  anv  iea-of-intere^  ^  b.?.signaled  from 

military  and  the  commercial  sector  *  t  ^'.,Th  tec}lnology  will  provide 

constrained  queriw  GIDBaLo^na'bles 

update  itself  7  S  °  °f  mterest  m  the  ^tabase  to  render  itself  in  3D 

INTRODUCTION  AND  BACKGROUND 

Agency  (NIMA) sponsored aF^SW  nM^0"  0ffiCe  ^DMS0^  and  the  National  Imagery  and  Mapping 
Cm Laboratoty  «)  S»L  Sp 

prototype.  Digital  Nautical  Chan  tDNC)  NRL  teim^Hwbh  “lt)NIMA !  4gtal  vector  mapping 
prototype.  1 "L)‘  NRL  ,eamed  Wlth  University  of  Florida  to  develop  this 

implemented  by  NIMA  VPF is arelatfond  51^7^  <^>ehfense  MaPP“g  Agency,  1993)  dataset 
with  their  spatial  and  non-spatial  attributes,  tSoughAe  7^  (features)>  aI° 

areas,  topological  support  among  coverages  potential  for  nnr  H  r  ’  ^  ProJect  addressed  the  follow ir 
improved  updating  potential  and  inrr^c^  ’  P  a  yr  n0t  duPIlcatlng  features  among  coverages 
NIMA  included  the  following  :  access  speed.  At  project  completion,  key  findings  repotted  a 
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An  00  Database  with  DNC  information  content  could  be  implemented. 

•  Feature  content  was  only  stored  once  as  compared  to  the  repeated  storage  of  some  features  in  the 
conventional  DNC. 

•  Direct  updating  of  features  and  attributes  was  demonstrated. 

•  An  order  of  magnitude  speedup  was  demonstrated  in  feature  access  time. 

•  Features  and  attributes  can  easily  be  modified  via  a  point-and-click  interface. 

NRL  extended  the  prototype  00  structure  in  FY  95  to  accommodate  multiple  VPF  databases  as  well  as 
two  different  00  database  management  systems  (ODBMS)  (Shaw,  et.  aL,  1998).  This  prototype  is  called 
the  Object-Oriented  Vector  Product  Format  (OVPF)  and  represents  a  transformation  of  NIMA  VPF 
relational  databases  into  an  00  structure.  The  OVPF  allowed  NIMA  a  rapid  look  at  the  potential  benefits 
of  00  approaches  for  allowing  Department  of  Defense  users  to  ask  for  information  from  NIMA  that  spans 
multiple  databases  (Shaw  et  al.,  1997). 

During  FY  96,  much  progress  on  conflation  (Cobb,  et  al.,  1998;  Foley,  et  al.,  1997)  and  the  base 
research  for  an  integrated  00  framework  (Shaw  et.  al.,  1997)  to  support  multiple  NIMA  data  types  was 
accomplished.  In  FY  97,  NRL  developed  the  first  prototype  of  the  integrated  00  framework,  as  well  as 
developing  a  prototype  00  Digital  Nautical  Chart  updating  system  for  NIMA.  This  system  showed  a  24:1 
speedup  over  NIMA’s  current  approach.  Also,  NRL  developed  the  initial  CORJBA  interface  for  the 
integrated  framework  to  allow  improved  electronic  dissemination  of  digital  mapping  objects  (Cobb,  et  al.. 
1998;  Wilson,  et  ai„  1998) 

Marine  Corps  Warfighting  Lab  (MCWL)  funded  the  NRL  to  develop  the  initial  GIDB  during  FY  98 
and  FY  99  to  support  an  Integrated  Marine  Corps  Multi-Agent  Command  and  Control  System 
(IMMACCS).  This  initial  prototype  demonstrated  an  ability  to  actively  manage  NIMA  mapping  data  in  an 
object-oriented  manner  that  could  be  interfaced  with  components  developed  by  Stanford  Research  Institute, 
California  Polytechnic  Institute,  and  NASA's  Jet  Propulsion  Laboratory.  The  GIDB  demonstrated  that  this 
integrated  database  could  be  viewed  and  engaged  in  both  2D  and  3D  via  a  simple  Internet  browser.  The 
GIDB  was  a  component  of  the  IMMACCS  successfully  used  in  the  Urban  Warrior  Advanced  Warfighting 
Experiment  in  March  *99. 

The  underlying  motivation  for  having  an  Internet-based  Java  client  access  our  00  mapping  database  is 
to  give  end  users  the  ability  to  access  and  use  NIMA  data  quickly  and  efficiently.  Currently,  users  of 
NIMA  data  must  have  resident  on  their  own  computer  systems  software  to  view  the  data,  and  must  obtain 
the  data  on  CD-ROM  or  other  storage  media.  However,  given  NTMA's  role  as  the  primary  geographic  data 
distributor  for  the  Department  of  Defense,  it  is  clear  that  electronic  dissemination  and  remote  updating  of 
NIMA's  digital  products  is  highly  desirable.  To  this  end,  the  GIDB  allows  any  user  with  a  Java-enabled 
web  browser,  such  as  Netscape  4.0,  to  access  our  database  over  the  Internet  and  display  NIMA  map  data 
available  in  their  area  of  interest.  Additionally,  privileged  users,  known  as  data  co-producers,  are  given  the 
ability  to  perform  remote  updates  on  the  data. 

This  paper  presents  the  design  and  underlying  architecture  of  the  IMMACCS  as  well  as  the  GIDB 
architecture  as  used  in  the  project  described  above.  The  remainder  of  this  paper  is  organized  as  follows. 

The  section  immediately  following  provides  a  high-level  description  of  the  overall  architecture,  followed 
by  emphasis  on  the  distributed  aspects  of  the  architecture  in  the  subsequent  section.  Following  that,  we 
focus  on  the  web  applet,  which  is  the  mechanism  by  which  remote  viewing  and  updating  of  information  is 
performed.  Details  of  the  network  updating  scheme  are  then  presented,  including  a  description  of  the 
IMMACCS  updating  test.  In  the  final  section,  we  present  our  concluding  remarks  and  indicate  directions 
for  future  work. 
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IMMACCS  ARCHITECTURE 

<mm pHcation  fri  tiiat  ^  Urban  introduces  some 

Marines  are  sent  to  urban  settings  to  evacuate  ^  SpaCS  Wlth  obstructed  view, 

urban  warfare,  MCWL  has  propelled  an  experiment  and  demons™1  SItUatI°ns' 1x1  Reparation  for  potential 
usrng  the  latest  technology  available.  In  1997  an  exercise  called  w  T  °S/h°W  t0  C°ndUCt  warfare 
“  4ta|  (more  depute  on  miCT0.chips)i  “XTsrs  g£“ 

ComblrOperatontc”er^COC)^  f'T  conUMnd 

tilt  urban  warfare.  The  IMMACCS  consists  oflbefclta'  °bi’czm  m  helP“S  »  effectivelv  control 
Viewer,  communication  Sone  “efcnt  ae, L  ^  a 

required  to  visualize  the  common  tactical  pierce  bv’ oXen“  5te  ECOC  A  reil-™e  <“*1*  ™ 
battle  space.”  Stanford  Research  Institute  (SRI)  develoned ,„H  „  E$°5  “  f"V]Iles  Ioot  place  in  the 
of  the  urban  battlespace.  Jet  Propulsion  Laboratorv  (TPLJ  nrnviH^?°VL  ^  2'dimensional  (2-D)  viewer 
communication  amon°  the  IMMACCF  nun-  •  '  L  Prov>ded  a  backbone  (ShareNet)  for  all 

(COREA)  was  the  r  B"*"  Areh~= 

of  die  IMMACCS.  California'  State  Pol  technic  Instimm' SoM T”5  differem  comP°"'“s 
agents,  that  were  required  to  assist  in  the  decision  makLn™  '  de^e!°Ped  ^d  provided  the  intelligent 
ingested  ail  track  information  fi-om ",.,cv commaSt^oo 7f  EC0C'  SPAWA*'’  MCSIT 

Information  System  (JMCIS)  and  translated  it  into  obiect  on”enredfSttmS  Jomt  Ma",im'  Command 
to  access  and  use.  SRI’S  2D-view,r  TJa  L  2,7  '  6r  *' IMMACCS  c™P«nenB 

to  provide  a  visualization  of  the  ban  e  space  is  well  «“  canabiio?  "qf'd  urba"  i^tstmemre 

puce  as  wen  as  a  capability  to  reason  about  the  surrounding. 

Dynamic  information  deafs^kTul^^^  of  information:  static  and  dynamic, 

of  Marines.  This  information  tb  heiiC0Pters  ““  3  «»W 

fed  into  the  system  provided  bv  SRI.  Static  1 hn?ugJ  MCSIT  35  wel!  «  other  GPS  signals 

man-made  structures  (e  ®  buildings  facflinVc  *  a  •  "  prcmdes  bodl  physical  and  built-up  environments 
topography,  “ VP  “ ■ "®*l  vmtoties  (e,g„  ' 

space.  The  term  geospatial  will  be  used  to  refer  to  the  satiSim  posra“  “  weil  “  their  occupancy  in 
upon  the  urban  infrastructure  in  terms  of  its  position  T  ^  dynamic  “formation  is  based 

information,  and  are  the  basis  for  °PenitIOn-  MapS  provide  static 

NIMA  has  been  designated  as  the  VPF  matTorovider  to  m  Tr  operatlon’  “eluding  urban  warfare. 

information  using  relational  tables  However  MCWL  sne  4  '„Sen,IC!S' provided  the  vecior  map 
oriented.  NRL  ingested  NIMA’s  ^  °Vera11  *o  be  object- 

format.  GIDB  developed  by  NRL  was  used  as  the^p^  “  °b*« 

maintained  by  ShareNerAlT OlStobl T*  “1-^e5)bject  bstance  Store  (0IS) 

OIS  stores  only  the  attributes  of  urban  infrastructure  obiecte  nn?^  i ?yStCm  comPonf“t.  "The 
mamtamed  in  the  OIS.  The  InCon  currentlv  Wnpc  bjects’  Pos“onal  mformatton  of  these  objects  are  not 
image  is  used  as  a  reference  map.  Therefore  InCon  needs  T  VeCt0r,maPs  for  ±e  “frastructure  objects;  an 
are  available  in  the  area  of  interest  (AOD ,  Both  £  Ann  GIDBt0  dete™“e  which  objects 

each  mfiastructure  object.  When  the  GIDB  provides  the  global  identifi  aglobaI.  identification  of 

InCon  requests  other  information  from  Oistfhis  two  =JL  b  d  fic  °n  ofthe  obJects  to  InCon,  then 
attn'butes  ofthe  infrastructure  ob^pf^Z  S^  T  ,SJmpIemented  because  the 
mfiastructure  object  in  the  IMMACCS  object  me !de?Sn  TteOK*  ^  3ttnbutes  defmed  for  each 
to  the  IMMACCS  environment.  Due  to  the  imposed  remnV  t  Provides  more  information  relevant 
CORBA  was  successfully  utilized  to 

components.  6  system,  IMMACCS,  from  the  different  system 
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"  “~  "  «•  »  «*—  b  r““  “****  *■ 

2.  Unload  the  t4ar'tnS~'VeCl°r  ~‘^  :r,ronnat:on  t0  object-oriented  forma:. 

3.  Allow  IrCon:o^e-oJsSal  cSs  °bJeC:  S:°re  vi2  C0RBA’ 

***  1  Sh°WS  66  GI°3  ^ve  role  within  the  overall  IMMACCS  archive. 


FIGURE  i 

IMMACCS  ARCHITECTURE 


FIGURE  2 

GIDB  ARCHITECTURE 


AOI  based  - 

Gem3uilcer 

,  ..ViGIierit^m-  j 

for  Smalltalk 
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! 

(display j 
•update)  '■■if-  2 

Gemstone 

Database 


,U — ►'  Got 


mORB 


Update  Client 


-■...Web-based " 
-Applet 


Cwmstone  is  an  object  ♦u -a-  ;.A.Dr  »  .  .  , 

server  consists  of  two  factional  mod-i-  s-ora “  °e5"ei  fo:  e2ci  sF?iicssi!on.  The 

applications.  GemORS  is  a  CORBA  2.0-compHantehjec;  reqt^'brokt^^at'wtrfJs^ewt^n^on'with 
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connection  —.•_ . u“c~*>‘“§  *««=  lime  ana  rpc  invocations.  An  esmbiishmeJt 

words,  the  client 


.  eSned  mechanism  with  underlying  «,<-■ 

»...e».ion  requires  naming  resolution  be*we»-  th*  ••  ■  r  '  —  ■  — “‘w».  /\n  e: 

“c  server  nave  an  agreement  or  how  m  ^  “G  ■th®  °bjec:  se:ve'  In 

-  *‘=.~  o*.  now  to  reference  ar.  ooject  by  name. 

Gem  ORB  establ  ishes  a  conre''’-;n-  *«  .•«, 

Architecture  (COREA)  comoiia^t^^S^^?0^  Co!TraOT  Object  Recues:  3ro<~ 

”£?=='  COM*.  The  aD  — •  *«  «*££• 

a.;!_the  benefits  of  intemperabilitv  am^^S  £  *e  SerVer‘  *nce  OemORB  is  CORBA-based 
“ ^  **  *=»  3 

GEMSTONE  OBJECT  SERVER  CONFIGURATION 


i.ne  Area-of-!nterest  (AOI)  based  c’i _ •• 

--ectm^biectserven  This  application  V ”«STSS“  G;rV.3“iider  fo* S-.aiitaik  to 
7 '  W’C  Cz:z  "P^tss.  An  obiec:  mav  be  s  e„te"l*  *  ’  "V**2  ?°?’J*:i::on.  cata  maintenance,  cam 

C"  85  °°^ec:  ±2:  ^fleets  the  actual  impIe~--m-'0-  nr!f  Ver7  nes:ed  obJs^  of  other  obiect  tvnes  I- 
ciier.cs.  i  ne  ^  o.  me  data  is  rear^se-'^  t,_- *  -  '  *  •  * 

"V  basec  applications  do  no'  reie^*  ; _ ,0  *  w-  usec  ay  AOI-oasec 

;...c....a.:on  maintenance  and  stetase  bv  de— '.vj  07.tt  1"  ‘sr* tation.  These  applications  minim :z» 

nto-mmuon.  Tnese  applications  only  «*  **■  ali 

o.  a  user,  i nus.  AOI-basec  aooilcrions  c5-  hem- ,7v *  °*  “I‘°““at:or‘ t0  Pe-o.-m  the  recueste^  ftne-m- 

o?::s  «  represented  or.  theVhenT^.”  «> clie5IS-'  °fa  ft.  ipfaSiv  '' 

W*  «  «*  WiiUions  °=  *'  WW  fa.  Tdl  pU? 

^ea-o .-interest  is  implemented  bv  using  a  srarfe]  HrW  .  _ 

.ww  uee  spariai  indexing  scheme  to  provide  a  Ve^-r**v*i  * ;  scnerye*  ir*e  GIDB  system  uses  a 
quaerree  recursively  divides  an  area  into  aua^^T*  ^  4  catering  ofdata  based  on  geographic  are-2  A 
ciscussion  of  a  anadtree  indexin.  sc.i“e  ci'Sf '  ““  lwhici  is  «“*»  quadceil. A  d'etaiie'd 

craned  VPFSpa.iaIDaraMa.ta~e.-  5  ^  “  *'  GIDB  *■»  &*».  a 

o7S  T  ““  «  Mred  ''fad  ^ «=;'"faS  fae.  All  spari'a! 

or  ^e  .epsrapn.c  coordinates  of  die  bouadir.  box  oflhe  o'btf'f  °f  “  “>= *  qaaddee  is  bases 
boup.ca.pox  of  an  objeo:  is  soieoted  to  «££$£?  ***■ A  •»  ■*#  copfae 

S^^hL0;  USSiSS  For  '5icie~  **  **  ** 

V. .  Spar:a!Da^Mapa~er  ciass.  Apy  daa  access  aa^l'^T'a^be.Ins^^s^cifying  i^dcabae,  library  and 


cov,  3e.  A  feature  retrieval  request  mav  soecifv  a  ^ar:  o^the  ar®*5  bv  c * 

those.  ^PabiSLsTa^er/sS  spScall^for  ±e  S^AC^G®  DS p^' A$id£ 
accmonai  capabilities:  -  u‘e  ^‘^B  provices  me  following 

1.  Stand-alone  application  as  wei!  as  web-enabiec  anolicaticn. 

—  Spatiai  query  (e.g.,  topoiogical.  geometrical  and  anribu*e-co''s--:— *  a  •  v  •  • 

szana-alone  mode.  fc  “  ~“***ww  c*e.ies)  via  V/eo  orowser  and  in 

j.  iserwonc  access  wim  an  ontim  17^  — , — __  _  _  *  - 

from  the  user.  ‘  “  “  '  3y  u*e  S£r/£r  responding  only  to  the  specified  request 

4.  Request  and  visualize  information  bv  an  area-of-:nte~«r 

'■  Local  dynamic  updating  or  modification  a:  a  fea—e  o' i-obie-w  rv  ■  ■ 

|e»=*«!  (e.g..  add.  d-,  . nod-fy  *"* “ 

6.  Reat-nme  upcates  from  the  data  orovider  e  »  \IMa  ‘  * 

/.  Aulow  those  changes  and  modifications  to  be'exported  back  into  V?F  forma*  '  • 

.  ->.iu-:..ec.a  integration  c:  audio,  video,  etc.  based  on  the  •»-»■»  o*:—“-es*  “o-  »Y,  . 

cup  or  piav  an  audio  clip  of  some  event  in  *-e  ar--0**--—»r  .  *  ^  s'ow  2  VICeo 

9.  .  Provide  weather,  news.  etc.,  bv  exnioitin"  r - ” •, 

IC.  Demonstrate  3-d  render:.-.-  of'ar.  nrea-of-m— '  S  3y  ^ 

modeS'etTort0*  ^  ?2?£r  W:i!  f°C"s  on  “-e  we>er.abied  capability  of  the  C-ID5  as  we!!  as  the  3-D 
INTERNET  APPLET 

C3?afaih'ries  f°‘* *  set. of  geographic 

Gem  Stone  ODBMS  actin/a s  a  server  on  a  ^^"5“  ^  ”  r2-eved  2  SmaHalk 
which  is  embedded  in  an  «TML  coc— on  ”  .  Cc......_...^c.:on  Detween  me  Java  anoieu 

architecrure  of  the  ‘  “ .  *"yone  2C2£SS::1§  “~£  Appier  Figure  4  shows  the  basic 


-IGL’1^  A 

WE3-EXABLED  SYSTEM  ARCHITECTURE 
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..  ‘ ^-interest  is  selected  hv  —  ^ 

2C:US-  Alternatively,  for  some  of  the  resfon  s?eci:lsd  b-v  latitude,  ionatrade 

m:s  canoe  seen  m  figure  5.  ‘  1  'n-e-es'  to  the  Marines,  a  selectable  Hst  is  p: 


AI!  objects  that  meet  the  in^t  eeW-^n-  •  ,  '  .  . 

*»  «*  <•»  «*  «  »m.  «  dtoh«4 

shows  in  figure  7.  ‘  °“S:^nea  oy  topdogy.  aeomeuy  or  ^tributes  can  be  invoked. 


FIGURE  7 

DISPLAY  AND  QUERY  SCREEN 


three  dimensional  visualization* 


Public  Service  Health  VP?~  daabase  for  the  United  States 

Te  were  the  only  required  incuts 'to  th‘4  c-oc»ss‘  Tr-J*”  ^  “COr  ?1~'S  of 

“e  3u:;ci“S,  such  as  floor  layouts.  dinte-sionl'S  ™,  2  P‘“S  pr0V!Ced  ds*iied  Romano-  ab 
cevelopec  :s  an  on-screen  digitize-  tha*- allow*'-  “  0.  e..try.  One  of  the  VPF-  tools  we  have 
geo.nteric  data  and  automate  production.  This  iSws  a^^' .SCanned  fioor  plans  to  exact  3D 
ouiicing,  and  accurate  piacement  of  interio-  roorr*  •  cs::nKl0n  of'^e  overall  exterior  of  the 

anc  faces  that  fonr.  the  three-dimensional  s^c^'^T^  Q00rwa-vs  b-v  def““S  the  nodes.  edse< 
or  «.  hospital.  3D  data  for  the  Main-  we:"=  'tStgfJS  2* looi  •»* 


as 
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FIGURE  5 

3D  MODEL  OF  USPS  PUBLIC  HOSPITAL 
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3o:n  a  visualization  of  the  3D  mode!  at  the  current  level  of  the  Marine,  and  a  2D  map  that  provides  th 
current  directional  information  are  provided  to  enable  those  Marines  inside  a  building  to  orient  themselves 
wimin  the  overall  context  of  the  building.  This  can  be  seen  in  Figure  10.  A  pointer  on  the  2D  map  shows 
me  user  s  location  within  the  building  and  direction  the  user  is  headina. 


FIGURE- 10 

TRACKING  OF  A  MARINE  INSIDE  A  BUILDING 


A  summary  of  3-D  display  capabilities  are  provided  in  fiaure  1 1 


FIGURE  1 1 

SUMMARY  OF  3D  VISUALIZATION 


Three  Views  of  the  U.S.  Public  Health  Sendee 
Hospital  located  at  the  Presidio,  San  Francisco. 
California.  Figure  I  shows  a  typical  2D 
representation.  Figure  2(a)  shows  a  3D-object 
model.  Figure  2(b)  shows  a  cut-away  of  the  first 
floor  including  inside  rooms,  etc. 

Figure  1. 

Figure  2(a). 

Figure  2(b). 
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CONCLUDING  REMARKS  AND  FUTURE  WORK  . 

warte  ejfromen'  ST/”  7m  m!Uay 

and  visualization  was  presented  in  detail.  Objea-orirated  idT?  ge“-callal  Motmation  retrieval 

apphcatton.  Furthetmore, ft, objeet-onenred fteS^ccS™^  ?  deV',°pta8  ““ 

C°RBA,  enhanced  interoperability  amon^  the  senarate  P  0Jectj  aionS  wi*  the  use  of 

enabimg  were  also  achieved.  Three-dimensional  supnortsSrCOT°xf  FunctionaIiti«  such  as  web- 
e^tCtlSoWIth  ^.environmental  and  situationdSSi  V  ?“  ^  ^ }°  °Perate  more 
m  the  GIDB.  Currently,  the  3D  generatiori  is  a  stand-aloneV™’*  ut^re  wor^..mc!udes  implementing  VPF+ 
development  for  integrating  fte  GIDB  with  the  3D  rendering  in  proje^  ^  °naI  ^earcb  and 
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Abstract.  The  Digital  Mapping,  Charting  and  Geodesy  Analysis  Program  of  the  Naval  Research  Laboratory  has 
developed  an  object-oriented  digital  mapping  database  prototype,  called  the  Geospatial  Information  Database  (GIDB). 
This  database  application  is  capable  of  importing  any  of  the  National  Imagery  and  Mapping  Agency's  (NIMA’s) 
Vector  Product  Format  data  and  converting  the  data  into  an  object  format.  Other  supported  NTMA  data  types  include 
Raster  Product  Format,  Text  Product  Standard,  and  Digital  Terrain  Elevation  Data.  The  GIDB  supports  multimedia 
data  as  well,  including  audio,  video,  and  imagery  (GIF  and  JPEG). 

Our  approach  involves  partitioning  the  globe  into  latitude-longitude  cells,  since  retrieval  of  objects  in  the 
application  is  always  based  on  a  user-defined  area  of  interest.  Because  any  spatially  referenced  data  can  be  indexed  by 
the  quadtree,  spatial  range  queries,  i.e.,  which  objects  are  located  within  a  particular  area,  are  efficiently  processed  for 
the  multiple  data  types  stored  in  the  GIDB.  Each  object  is  defined  to  have  a  minimum  bounding  rectangle,  or  latitude- 
longitude  bounding  box,  which  is  used  to  determine  its  placement  within  a  quadtree.  In  this  paper,  a  brief  description  of 
the  vector,  raster,  text,  and  gridded  data  types  that  are  stored  in  the  GIDB  is  presented,  followed  by  a  detailed 
description  of  the  basic  quadtree  design.  The  utilization  of  the  resulting  quadtree  organization  is  then  outlined  and 
discussed;  specifically,  the  method  by  which  objects  are  placed  in  the  quadtree,  as  well  as  the  algorithms  for  object 
retrieval,  are  analyzed. 


1.  Introduction.  The  Digital  Mapping,  Charting  and  Geodesy  Analysis  Program  (DMAP)  of  the 
Naval  Research  Laboratory  began  an  effort  in  1994  to  develop  an  object-oriented  (00)  digital  mapping 
database,  called  the  Geospatial  Information  Database  (GIDB)  [1],  This  database  application  is  capable  of 
importing  multiple  data  formats  and  storing  the  data  in  a  quadtree  data  structure  for  retrieval.  The  first  data 
format  to  be  implemented  in  the  GIDB  was  Vector  Product  Format  (VPF)  data  from  the  National  Imagery 
and  Mapping  Agency  (NIMA)  [2].  The  data  is  converted  into  an  object  format  and  persistently  stored  in  a 
GemStone  object-oriented  database  management  system  (ODBMS).  Other  supported  NIMA  data  types 
include  Raster  Product  Format  (RPF)  and  Text  Product  Standard  (TPS).  The  GIDB  supports  multimedia 
data  as  well,  including  audio,  video,  and  imagery  (GIF  and  JPEG). 

These  supported  data  types  are  stored  in  quadtree  data  structures  within  the  GIDB.  A  quadtree  is  an 
indexing  structure  used  for  the  storage  and  retrieval  of  two-dimensional  data.  The  principle  upon  which  it  is 
based  is  simply  the  regular  recursive  subdivision  of  blocks  of  spatial  data  into  four  equal-sized  cells,  or 
quadrants.  Cells  are  successively  subdivided  until  some  criterion  is  met,  usually  either:  (1)  each  cell 
contains  homogeneous  data,  e.g.,  a  single  ''feature"  for  vector  data,  or  rasters  containing  the  same  value 
(known  as  a  region  quadtree),  or  (2)  a  preset  number  of  decomposition  iterations  has  been  performed  [3]. 
Thus,  the  cells  of  a  quadtree  are  of  a  standard  size  (in  powers  of  two)  and  are  non-overlapping  in  terms  of 
areal  representation.  A  tree  structure  is  then  constructed  by  arranging  each  cell  as  the  parent  of  its 
component  quadrant  cells.  This  structuring  leads  to  a  tree  whose  nodes  at  any  level  of  the  tree  are  all  of  the 
same  size,  that  size  being  exactly  one-fourth  the  size  of  the  nodes  at  the  next  higher  level  of  the  tree. 

In  this  paper,  a  brief  description  of  the  various  data  types  that  are  stored  in  the  GIDB  is  presented, 
followed  by  a  detailed  description  of  the  basic  quadtree  design.  The  utilization  of  the  resulting  quadtree 
organization  is  then  outlined  and  discussed.  The  methods  by  which  objects  are  placed  in  the  quadtree,  as 
well  as  the  algorithms  for  object  retrieval,  are  analyzed. 

2.  GIDB  Data  Formats.  VPF  features  are  geographic  data  objects  composed  of  both  spatial  and  non- 
spatial  information.  The  non-spatial  information  includes  source  data,  such  as  where  the  data  were 
obtained,  and  attribute  data.  For  example,  if  the  VPF  feature  were  a  building,  it  would  include,  among  other 
data,  information  about  the  building’s  height,  width,  name,  and  use.  Spatially,  each  VPF  feature  contains 
latitude-longitude  coordinate  information.  A  VPF  feature  can  be  a  point,  a  line,  or  an  area  feature.  A  point 
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consists  of  a  single  latitude-longitude  coordinate  pair.  A  line  is  a  sequence  of  latitude-longitude  points,  and 
an  area  is  a  closed  polygon  region.  Each  line  and  area  feature  is  defined  to  have  a  latitude-longitude 
bounding  box,  or  minimum  rectangular  region  that  completely  encloses  the  feature.  For  point  features,  the 
bounding  box  is  determined  using  a  small  offset  value. 

Each  VPF  feature  also  has  topology  associated  with  it.  Topology  is  the  study  of  the  characteristics  of 
geometrical  objects  that  are  independent  of  the  underlying  coordinate  system,  such  as  adjacency  and 
contiguity  [4],  The  topological  information  is  provided  to  improve  spatial  analysis  capabilities  [5].  There  is 
a  difficulty  in  representing  complex  geographic  data,  such  as  VPF  features,  by  decomposing  the  data  to  fit 
the  fiat-file  structure  imposed  by  a  relational  database  model.  A  change  in  the  topology  of  a  VPF  feature, 
for  example,  will  result  in  changes  to  numerous  tables  in  the  relational  model.  With  our  object-oriented 
model  in  the  GIDB,  topological  and  other  relationships  among  features  can  be  handled  more  simply  and 
directly  due  to  encapsulation,  inheritance,  and  polymorphism. 

A  minimum  bounding  rectangle  (MBR)  can  be  used  as  an  approximation  to  an  actual  spatial  entity  [6]. 
An  MBR  is  a  minimum  x-y  parallel  rectangular  region  necessary  to  completely  enclose  the  spatial 
representation  of  data,  as  shown  in  Figure  1.  The  use  of  MBRs  in  geographic  databases  is  widely  practiced 
as  an  efficient  way  of  locating  and  accessing  objects  in  space  [7].  A  major  advantage  of  an  MBR 
representation  is  that  all  objects  are  represented  as  2-D  objects  across  which  operations  can  be  applied 
uniformly.  Fortunately,  VPF  line  and  area  data  store  MBR  information  in  the  form  of  the  latitude-longitude 
bounding  box.  The  spatial  component  of  VPF  point  data  is  the  coordinate  iLself.  However,  a  minimal  offset 
could  be  imposed  with  a  radial  distance  using  the  coordinate  location  as  a  center,  creating  an  MBR  for  the 
point  feature.  Hence,  an  MBR  representation  can  be  used  for  each  VPF  feature. 


VPF  Objects 


Point  Line  Area 

FIG.  1.  Minimum  Bounding  Rectangles  (MBRs)  for  VPF  Features. 

Although  the  GIDB  was  originally  developed  for  storage  and  access  of  only  VPF  features, 
development  has  expanded  to  include  other  data  types.  NIMA’s  RPF  data  type  is  a  format  to  store  satellite 
imagery  and  digitized  aeronautical  charts.  RPF  data  is  composed  of  rectangular  frames,  and  each  frame  is 
composed  of  smaller  subframes.  The  subframe  itself  is  a  rectangular  region,  so  it  can  be  used  as  an  MBR  to 
correlate  to  the  MBR  of  VPF  features.  Figure  2  shows  a  sample  RPF  image. 


Fig.  2.  Sample  RPF  Images. 
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NIMA’s  TPS  data  is  a  text  file  that  contains  relevant  information  for  the  user  such  as  notices  of 
changes  or  lessons  learned  from  military  missions.  TPS  data  is  in  SGML  format,  and  embedded  within 
each  data  set  is  a  simple  browser  to  display  the  TPS  data.  Each  TPS  file  has  a  gazeteer  that  includes  an 
associated  latitude-longitude  position,  so  an  MBR  can  be  defined  for  it  in  the  same  way  as  for  a  VPF  point 
feature. 

An  MBR  is  defined  for  multimedia  data  as  well,  representing  the  geographic  region  relevant  to  the 
audio  clip,  video  clip,  or  imagery.  Audio  and  video  clips  can  be  of  any  standard  format.  The  imagery  can 
be  photographs,  floor  plans,  or  other  items  stored  in  GIF  or  JPEG  format. 

The  MBR  is  defined  for  every  object  type  to  be  stored  in  the  GIDB.  Consequently,  the  MBR  can  be 
used  to  determine  how  and  where  an  object  is  stored  in  the  database.  Use  of  the  MBR  is  the  basis  for  the 
quadtree  design  and  organization  that  is  implemented  in  the  GIDB. 

3.  Quadtree  Organization.  A  quadtree  is  formed  by  the  regular  recursive  subdivision  of  blocks  of 
spatial  data  into  four  equal-sized  cells,  or  quadrants.  In  our  implementation,  a  cell  is  not  created  unless  the 
cell  has  data  or  at  least  one  of  its  subsequent  children  cells  has  data.  Since  cells  are  created  only  when  they 
are  needed,  search  time  for  objects  is  reduced  and  memory  is  not  wasted.  To  determine  whether  an  object  is 
to  be  placed  in  a  cell,  its  MBR  is  used.  A  simple  intersection  computation  is  calculated  to  determine  a  cell 
to  house  a  given  spatial  object.  Spatial  data  placement  is  determined  based  on  the  virtue  of  the  MBR 
intersecting  at  least  one  of  the  dividing  lines  used  to  form  children  cells.  An  object  is  placed  in  the  smallest 
cell  that  completely  contains  its  MBR.  In  other  words,  an  object  is  stored  at  the  lowest  level  in  which  it  fits 
in  the  quadtree.  Figure  3  shows  a  sample  subdivision  of  an  image  into  quadrants  and  the  resultant  quadtree 
representation. 


FlG.  3.  Sample  Subdivision  into  Quadrants  and  the  Resultant  Quadtree  Representation 

All  of  the  data  to  be  stored  in  our  database  is  digital  map  data  or  data  that  can  be  geographically 
represented.  Consequently,  we  define  the  root  cell  of  our  quadtree  to  be  the  minimum  bounding  square  of 
the  world.  The  root  cell  has  a  latitude-longitude  bounding  box  of  origin  -180,  -180  and  comer  180,  180. 
Note  that  each  cell  of  our  quadtree  will  be  a  square  region,  which  is  a  design  consistent  with  most 
implementations  of  a  quadtree  structure.  Within  this  application,  the  depth  of  a  quadtree  has  been  set  to  20. 
The  setting  of  a  maximum  depth  implies  that  an  extent  of  a  cell  at  level  20  is  about  1.4555  km  x  1.4555 
km.  It  is  necessary  to  place  an  upper  bound  on  the  levels  of  the  quadtree  since  a  large  number  of  VPF  data 
are  point  features.  An  unbounded  quadtree  would  be  very  deep  to  include  cells  that  contain  point  features. 
Such  a  quadtree  would  also  contain  a  large  number  of  small  children  cells.  Also,  since  a  quadtree  search 
always  begins  from  the  root  of  a  tree,  a  limitation  on  the  depth  of  a  tree  improves  data  search  and  retrieval 


4 


RUTH  A.  WILSON,  MARIA  COBB,  MIYI  CHUNG  AND  KEVIN  SHAW 


as  well.  The  maximum  depth  of  the  quadtree  can  be  easily  increased  in  the  future  so  cells  at  the  lowest 
level  cover  a  smaller  geographic  area. 

A  quadtree  is  implemented  in  our  system  using  an  object-oriented  methodology.  According  to  Frank 
and  Egenhofer,  there  is  a  paradigm  shift  of  spatial  data  from  a  relational  data  structure  to  an  object-oriented 
representation  [8].  Frank  and  Egenhofer  concluded,  “spatial  data  are  too  complex  to  be  managed  within  a 
relational  data  model.”  For  this  reason,  an  object-oriented  approach  using  the  Smalltalk  programming 
language  is  implemented  in  the  GIDB  application.  An  introduction  to  object-oriented  class  diagrams  and 
terms  developed  and  designed  for  the  GIDB  is  presented  in  [9]. 

A  quadtree  is  implemented  in  the  GIDB  application  as  SpatialDataManager  and  SpatialDataCell 
classes.  As  the  name  suggests,  the  SpatialDataManager  manages  the  spatial  tree  indexing  framework,  or 
quadtree.  The  SpatialDataManager  class  has  an  instance  variable  topCell  that  represents  the  root  of  the  tree. 
This  instance  variable  is  a  pointer  to  the  cell  that  represents  the  spatial  bounds  of  all  available  data.  The 
SpatialDataManager  class  has  another  instance  variable  maxLevel  that  gives  the  maximum  depth  of  the 
quadtree.  This  maximum  depth  can  be  changed  as  more  features  are  added  to  the  quadtree  so  cells  at  the 
lowest  level  contain  fewer  features  and  cover  a  smaller  geographic  region. 


SpatialDalaManager 
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FlG.  4.  SpatialDataManager,  SpatialDataCell  and  SpatialContainer  Class  Diagrams 

A  topCell  is  an  instance  of  the  SpatialDataCell  class  which  has  instance  variables  of  superCell ,  level, 
manager,  origin,  corner,  container,  and  children  cells  1,  2,  3,  and  4.  A  superCell  is  a  pointer  to  the  cell's 
parent  cell.  The  level  gives  the  level  of  the  cell  in  the  quadtree.  Manager  is  a  pointer  to  the  quadtree  itself. 
The  origin  and  corner  provide  the  latitude-longitude  MBR  of  the  cell.  The  container  is  a  pointer  to  an 
instance  of  the  SpatialContainer  class  that  contains  the  features  located  in  the  cell.  A  cell  can  have  a 
maximum  of  four  children  cells.  When  a  cell  is  instantiated,  the  children  cells  1,  2,  3,  and  4  are  initialized 
with  a  nil  or  null  value.  Each  child  cell  becomes  an  instance  of  SpatialDataCell  only  when  data  needs  to  be 
added  to  the  child  cell  or  to  one  of  its  children. 

A  container  is  an  instance  of  the  SpatialContainer  class  that  has  instance  variables  of  cell  and  features. 
The  cell  is  a  pointer  to  the  cell  that  referenced  the  container .  Features  is  a  collection  of  all  spatial  objects 
for  which  the  referencing  cell  is  the  lowest  level  cell  completely  containing  each  object's  MBR.  A  diagram 
of  the  SpatialDataManager,  SpatialDataCell,  and  SpatialContainer  classes  is  given  in  figure  4. 

4.  Adding  a  Feature  to  the  Quadtree.  Data  insertion  into  a  quadtree  begins  with  a  search  from  the 
root  of  a  tree.  The  following  request  is  made:  ( quadtree  containerFor :  aFeature  boundingBox) 
addFeature:  aFeature.  A  quadtree  is  an  instance  of  SpatialDataManager,  so  it  represents  the  quadtree  in 
which  the  item  is  being  placed.  It  is  asked  to  find  a  minimum  cell  that  can  fully  contain  the  MBR  of  a 
spatial  object,  aFeature  boundingBox.  The  containerFor :  method  first  checks  a  cell  to  see  if  it  can  contain 
the  feature’s  MBR  by  calling  the  method  aCell  canContainBoundingBox :  aFeature  boundingBox .  If  aCell 
can  contain  the  feature’s  MBR,  then  the  method  aCell  containerForBoundingBox:  aFeature  boundingBox 
is  called.  This  method  checks  the  cell  and  its  children  cells  to  determine  the  lowest  level  cell  in  which  to 


A  SPATIAL  INDEXING  FRAMEWORK  FOR  GEOGRAPHIC  DATA  STORAGE  AND  RETRIEVAL 


5 


place  the  feature.  Now  that  the  appropriate  cell  is  found  aFeature  is  added  to  the  features  collection  of  the 
corresponding  cell’s  container. 

As  mentioned  previously,  the  MB  R  is  used  to  add,  retrieve,  and  remove  any  of  the  data  type  features 
from  a  quadtree.  The  calculation  method  for  determining  whether  a  feature’s  MBR  intersects  a  quadcell’s 
MBR  is  as  follows: 

featMBR  intersects:  cellMBR 

A  ((featMBR  origin  <=  cellMBR  corner )  and:  (cellMBR  origin  <= featMBR  corner)) 

The  origin  represents  the  lower  left  point  of  the  MBR  and  the  comer  represents  the  upper  right  point  of 
the  MBR.  Notice  that  this  method  is  a  simple,  fast  calculation  to  determine  whether  a  feature  should  be 
within  a  quadcell. 

5.  Retrieving  a  Feature  from  the  Quadtree.  Data  retrieval  from  the  quadtree  also  begins  with  a 
search  from  the  root  of  the  tree.  The  method  call  quadtree  relurnAllFeatures  will  return  a  collection  of  all 
of  the  data  objects  that  are  located  within  the  quadtree.  The  method  call  quadtree  returnNumberOJFeatures 
will  return  only  the  number  of  data  objects  in  the  quadtree.  To  obtain  a  collection  of  all  features  that  arc 
located  within  a  given  area  of  interest  (AOI),  the  method  quadtree  returnSetOflntersectingFeatures: 
aBoundingBox  is  called,  where  aBoundingBox  is  the  MBR  of  the  AOI. 

6.  Removing  a  Feature  from  the  Quadtree.  Data  deletion  from  a  cell  simply  removes  the  data  from 
the  cell’s  features  collection.  The  feature  to  be  removed  is  retrieved  from  the  quadtree  using  one  of  the 
methods  described  above,  and  then  is  deleted  from  the  features  collection.  If  the  removal  of  the  feature 
causes  the  features  collection  to  become  empty,  the  instantiated  quad  cell  stays  intact.  Note  that  removal  of 
features  from  a  quadtree  is  a  rare  occurrence  given  the  nature  of  geographic  data. 

7.  Conclusions.  In  this  paper,  we  have  described  various  data  types  that  are  stored  in  the  GIDB.  Use 
of  the  MBR  in  our  database  allows  for  storage  of  these  multiple  data  types  within  the  same  quadtree  design. 
Based  on  the  success  of  our  GIDB  prototype,  we  have  found  that  a  quadtree  organization  for  spatial 
indexing  is  ideal  for  fast  feature  storage  and  retrieval  in  an  object-oriented  environment. 
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ABSTRACT 

The  Digital  Mapping,  Charung  and  Geodesy  Analysis 
Program  (DMAP)  of  the  Naval  Research  Laboratory'  has 
developed  an  object-oriented  (00)  digital  mapping 
database  prototype,  called  the  Geospatial  Information 
Database  (GIDB),  capable  of  importing  any  of  the 
National  Imagery  and  Mapping  Agency's  Vector  Product 
Format  data  and  converting  the  data  into  an  object  format. 
The  DMAP  Team  has  also  investigated  existing  00 
technology  that  would  allow  the  transfer  and  retrieval  of 
data  mom  the  GIDB  over  the  internet.  This  article 
describes  how  we  have  used  a  Java  applet  and  the 
CORBA  standard  to  succeed  in  our  endeavor  to  make 
mapping  data  from  our  GIDB  accessible  over  the  W'orid 
Wide  Web. 

KE1  WORDS:  Java.  Smalltalk,  CORBA.  Object- 
Oriented.  Mapping,  Database  ' 

1  INTRODUCTION 

The  Digital  Mapping,  Charung  and  Geodesy  Analysis 
Program  (DMAP)  of  the  Naval  Research  Laboratory' 
began  an  effort  in  1994  to  develop  an  object-oriented 
(00)  digital  mapping  database  [1].  The  00  data  model  is 
derived  from  a  relational  data  model  known  as  Vector 
Product  Format  (VPF),  a  digital  mapping  standard 
produced  by  the  National  Imagery  and  Mapping  Agency 
(NIMA)  [2].  The  need  for  the  00  data  model  stems  from 
the  difficulty  in  representing  complex  geographic  data  by 
decomposing  the  data  to  fit  the  flat  file  structure  imposed 
y  the  relational  model.  The  00  prototype  that  has  been 
developed,  called  the  Geospatial  Information  Database 
(GIDB),  is  capable  of  importing  any  of  NIMA1  s  VPF 
products  and  convening  the  data  into  an  object  format  for 
display,  query,  and  updating  purposes  [3],  This  system 
has  been  extended  to  include  00  models  for  NIMA’s  two 
other  families  of  digital  products.  Raster  Product  Format 
(RPF)  and  Text  Product  Standard  (TPS). 


In  early  1997,  the  DMAP  Team  began  to  research 
existing  00  technology  that  would  allow  the  transfer  and 
retrieval  of  data  from  the  GIDB  over  the  internet.  Tne 
emergence  of  Java  as  the  00  language  of  choice  for 
internet  applications  led  us  to  seek  a  way  to  use  a  Java 
Interface  to  access  our  GIDB  which  is  written  in 
Smalltalk.  The  solution  was  found  in  the  Common  Object 
Request  Broker  Architecture  2.0  (CORBA  2.0)  standard, 
which  allows  for  interoperability  between  different  00 
languages  and  across  multiple,  platforms. 

Background  information  on  CORBA  and  Object 
Request  Brokers  (ORBs)  is  given  in  section  2.  Section  3 
gives  an  overview  of  our  system  design,  and  section  4 
describes  the  Interface  Definition  Language  (IDL)  for  our 
map  features.  This  is  followed  by  a  summary  of  the 
available  functions  of  our  application  on  the  web  in 
section  5.  Tne  implications  of  our  system  to  the  mapping 
community,  as  well  as  a  discussion  of  future  work,  are 
found  in  section  6. 

2  CORBA 

CORBA  was  developed  by  the  Object  Management 
Group  (OMG),  a  consortium  of  software  developers  and 
users.  In  1994,  OMG  released  the  CORBA  2.0  standard, 
which  specifies  a  middleware  architecture  for  distributed 
object  communication  [4j.  CORBA  is  a  nonproprierary, 
industry-supported  standard  which  has  experienced  much 
growth  in  recent  years.  Several  of  its  benefits  include 
scalability,  vendor  independence  through  interoperability, 
language  and  platform  transparency,  and  support  for 
reuse. 

The  main  component  of  the  CORBA  specification  is 
the  Object  Request  Broker  (ORB).  The  ORB  is 
responsible  for  intercepting  a  request,  locating  the  object 
for  handling  the  request  and  invoking  the  correct  method 
on  that  object  This  often  involves  converting  parameters 
from  a  common  data  type  to  a  language-specific  data  type 
and  vice  versa  (a  process  known  as  marshalling  and 
unmarshalling),  as  well  as  returning  results  from  the 
invoked  method.  Any  two  ORBS  that  are  CORBA  2.0 
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HE-?compliant  can  provide  communication  between  their 
Hfepplication  objects,  regardless  of  programming  language 
yfor  platform. 

In  our  development,  we  use  Visigenic’s  Visibroker  as 
Kfenur  Java  ORB,  and  GemStone's  GemOrb  as  our  Smalltalk 
Bjf'ORB-  These  two  vendor  ORBs  allow  communication 
Ijp  between  applications  via  CORBA’s  Internet  Inter-0R3 
|||:- Protocol  CnOP).  The  mechanism  through  which  objects 
|H'.  are  accessed  through  the  BOP  and  between  ORB  vendors 
If  ‘is  with  the  interoperable  object  reference  (IOR).  The 
|'  IOR.  which  is  managed  by  the  ORBs  and  is  invisible  to 
application  programmers,  includes  the  ORB ’s  internal 
|v-  object  reference,  Internet  host  address,  and  port  numbers. 

It  is  the  use  of  the  IOR  that  allows  for  vendor-independent 
interoperability, 
r 

*v 

•  The  Interface  Definition  Language  (IDL)  is  perhaps 
the  most  significant  component  of  the  CORBA  standard. 
The  IDL  is  a  universal  notation  for  software  interfaces:  it 
provides  the  common  definitions  through  which  objects  in 
different  programming  languages  can  communicate.  The 
syntax  of  IDL  is  very  similar  to  C+-r,  but  objects  defined 
in  IDL  can  be  implemented  in  any  programming  lantruase 
that  has  CORBA  bindings,  such  as  Smalltalk.  Java.  C— ' 
and  Ada. 

3  PROJECT  OVERVIEW 

The  goal  of  the  project  was  to  have  an  internet  Java- 
based  mapping  client  which  would  provide  display  and 
query  capabilities  for  a  set  of  geographic  objects,  such  as 
raster  images  and  vector  features  [5].  These  geographic 
objects  would  be  retrieved  from  a  Smalltalk  GemStone 
00  database  management  sysiem  (OODBMS)  actins  as  a 
server.  Communication  between  the  Java  applet,  which 
is  embedded  in  an  HTML  document  on  a  web  page,  and 
the  Smalltalk  application  is  accomplished  using  CORBA- 
compliant  vendor  ORBs.  The  use  of  Visibroker  and 
GemOrb  is  completely  transparent  to  anyone  accessing 
the  applet.  Figure  1  shows  the  basic  architecture  of  our 
system. 


The  retrieval  of  features  from  the  server  database  is 
based  on  the  concept  of  area  of  interest  (AOI).  The  first 


Figure  1 .  Basic  system  design. 


screen  of  the  applet  displays  a  world  map  from  which  the 
user  can  select  a  location  graphically  through  the  use  of  a 
rectangle  (bounding  box).  The  user  also  has  the  option  of 
entering  the  coordinates  for  the  AOI  manually.  From  the 
user  input,  a  bounding  box  of  the  AOI  is  transmitted  from 
the  applet  via  CORBA  to  the  Smalltalk  server.  The  server 
responds  with  a  set  of  database  and  library  names  for 
which  data  is  available  in  that  region.  NIMA  provides 
VPF  data  in  databases,  and  each  database  contains  one  or 
more  libraries.  The  user  then  selects  a  database  and 
library,  resulting  in  a  list  of  coverages  and  feature  classes 
being  returned  from  the  server  through  another  CORBA 
request,  as  shown  in  figure  2. 

Finally,  the  user  selects  the  feature  classes  of  interest 
and  submits  a  request  for  them  to  be  displayed.  This 
request  results  in  another  CORBA  communication,  and 
the  server  returns  a  set  of  all  of  the  features  of  the 
requested  classes  which  are  located  in  the  given  AOI. 

These  features  that  are  returned  are  complex  objects  with 
both  geometric  (coordinate)  and  attribute  information. 

The  applet  can  then  display,  select,  and  query  on  the 
returned  features,  as  shown  in  figure  3.  The  available 
functions  that  have  been  implemented  from  within  the 
applet  will  be  discussed  in  detail  in  section  5. 

4  INTERFACE  DEFINITION  LANGUAGE 

The  IDL  is  essential  for  communication  between 
different  programming  languages.  An  IDL  file  must  be 
created  to  allow  for  correct  mappings  of  objects  from  one 
application  to  another:  it  is  the  means  by  which  potential 
clients  determine  what  operations  are  available  and  how 
they  should  be  invoked.  In  our  system,  our  DDL  file 
defines  all  of  the  objects  that  are  common  to  both  client 
and  server,  as  well  as  methods  that  can  be  invoked  to 
perform  certain  operations  on  the  server. 


Tne  first  object  that  is  utilized  bv  both  the  Java  client 
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and  the  Smalltalk  server  is  a  bounding  box  for  the  AOI.an 
instance  of  the  VPFBoundingBox  class.  To  define  our' 
VPFBoundingBox  object,  we  use  a  struct  data  type  which 
allows  related  items  to  be  grouped  together.  For  example, 
struct  Point  (float  x,y;);  defines  a  point  to  be  made  up  of  ’ 
two  float  values,  x  and  y.  Similarly,  our 
VPFBoundingBox  is  defined  to  be  composed  of  two 
points,  an  origin  and  a  comen  struct  VPFBoundingBox 
(Point  origin;  Point  comer;};.  We  then  defined  an 
interface  called  Integrated ,  which  contains  the  methods 
on  the  server  that  are  invoked  by  the  client.  An  interface 
is  the  most  important  aspect  of  IDL.  since  it  provides  all 
the  information  needed  for  a  client  to  be  able  to  interact 
with  an  object  on  the  server.  Shown  below  is  our 
interface  from  the  IDL  file. 
interface  Integrated  { 

String  Coll  e  ction  RetumDatabasesForAOKin 
VPFBoundingBox  aBB); 

StringColIection  RetumCoveragesForAOKin 
string  dbName .  in  string  lib  Name); 

FeatureCollection  ReturnFeatures(in 
StringColIection  featCoil,  in 
VPFBoundingBox  aBB .  in  siring  dbname.  in 
string  libname,  in  string  covname):): 

Note  that  the  interface  contains  the  method  names  with 

their  parameters,  as  well  as  the  data  type  of  the  returned 
object. 


The  most  complex  structure  defined  in  our  IDL  is  the 
struct  CORB  A VPFr eature. 
struct  CORBAVPFFeature  [ 

string  feamame ;  string  dbname : 

AttributeCollection  attributes: 

Coordinates  coords:  long  id: 
string  covname:  string  libname: 

VPFBoundingBox  bo  undin  gB  ox:  j: 


derived.  The  Smalltalk  VPFFeature  class  is  more 
complex  and  has  many  more  attributes,  such  as 
featureDef,  notes,  and  prims.  For  our  internet  applet, 
though,  only  those  attributes  that  are  needed  for  disolav 

and  user  queries  are  defined  as  shown  above 


a"umer  structure  which  defines 
CORB  Attribute.™  CORBAAttribute  (string  name; 
string  value;};.  The  CORBAAttribute  structure  is  used  as 
pan  of  the  CORBAVPFFeature  structure  and  gives 
attnbute  names  and  values  for  a  given  feature  Instance. 
For  example  a  given  building  may  have  an  attribute  name 
Structure  Shape  of  Roof’  with  attribute  value  “Gabled.” 


The  final  data  type  included  in  our  IDL  file  is  a 
sequence,  which  is  similar  to  a  one-dimensional  anav  but 
does  not  have  a  fixed  length.  We  use  sequences  to  reduce 
the  number  of  messages  passed  from  server  to  client.  The 
size  of  each  sequence  is  determined  dynamically  on  both 
the  server  and  the  client.  The  following  sequences  are 
found  in  our  IDL  file. 

typedef  sequence<Poinr>  Coordinates: 
typedef  sequen  ce<srring>  String  Collection: 
typedef  sequence<CORBA  VP  F Features 
FeatureCollection : 


This  IDL  file  must  be  compiled  on  both  the  client  and 
the  server.  On  the  server,  the  IDL  is  filed  in.  bindings  to 
objects  are  made  appropriately,  and  new  methods  are 
created.  On  the  Java  client,  the  process  is  similarly 
performed  via  an  IDL  to  Java  mapping.  Objects  defined 
in  the  IDL  can  then  be  referenced  and  used  in  both  the 
client  and  server  code. 


5  FUNCTIONALITY  ON  THE  WEB 


The  Smalltalk  application  has  an  object  class  called 
VPrreature  from  which  our  CORBAVPFFeature  is  " 


The  underlying  motivation  for  having  a  web-based 
Java  client  access  our  00  mappins  database  is  to  rive  *nd 
users  the  ability  to  access  and  use  NIMA  data  ouicklv  and 
efficiently.  At  the  present  time,  users  of  NIMA  data  must 
have  software  to  view  the  data  resident  on  their  own 
computer  systems,  and  must  obtain  the  data  on  CD-ROM 
or  other  storage  media.  Our  Java  applet  allows  anv  user 
with  a  Java-enabled  web  browser,  such  as  Netscape  4.0  or 
Internet  Explorer  4.0.  to  access  our  GIDB  over  the 
internet  and  display  NIMA  map  data  available  in  their 
area  of  interest.  In  addition  to  display  of  map  objects,  we 
have  extended  the  functionality  of  the  Java  client  to 
include  simple  queries,  individual  feature  selection,  zoom 
capabilities,  attribute  queries,  geometrical  queries,  and 
updates  of  attribute  values.  These  functions  were 
available  in  our  stand-alone  Smalltalk  application,  and 
have  been  adapted  to  our  Java  interface. 

After  the  selected  features  in  a  user’s  AOI  have  been 
returned  to  the  Java  client  and  displayed,  the  user  can 
change  the  colors  of  the  features  to  distinguish  between 
die  feature  classes  retrieved.  A  color  key"is  shown  (figur 
i)  providing  the  color,  feature  class,  and  number  of  those 
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ABSTRACT 

We  have  previously  considered  issues  relative  to  conflation  in  a  geographic  information  system  (GIS) 
based  on  the  Vector  Product  Format  (VPF)  developed  by  the  National  Imagery  and  Mapping  Agency 
(NIMA)  as  the  Department  of  Defense  relational  database  interchange  standard  for  vector  mapping.  In  this 
context  we  developed  a  knowledge-based  system  to  provide  input  on  both  spatial  and  non-spatial  properties 
of  geographic  features.  Degrees  of  matching  were  generated  for  candidate  features  based  on  measurable, 
objective  measures  as  well  as  subjective  ones.  Matches  for  non-spatial  properties  (attributes)  were 
generated  on  the  basis  of  similarity  tables  developed  for  the  allowed  VPF  attribute  sets.  These  tables  were 
used  in  conjunction  with  a  fuzzy  combination  function  to  provide  the  overall  degree  of  matching  of  the 
candidate  features’  attribute/value  sets.  An  expert  system  also  generated  weights  for  the  combination 
function  using  rules  that  represented  semantic  interrelationships  of  feature  attributes. 

We  are  currently  developing  a  distributed  object-oriented  spatial  database  at  the  Naval  Research 
Laboratory.  Within  this  context,  conflation  is  a  very  evident  concern,  as  the  distributed  environment  entails 
possible  issues  of  schema  (metadata)  merging,  as  well  as  specific  feature  data  conflation.  Our  previous 
approach  using  an  expert  system  for  conflation  is  not  directly  feasible  in  a  distributed  environment.  Hence, 
we  have  developed  a  model  of  conflation  tailored  to  a  distributed  environment,  for  which  uncertainty  issues 
handled  by  our  previous  conflation  system  are  fully  accounted. 


INTRODUCTION 

In  recent  years,  the  trend  for  most  types  of  information  systems  has  been  a  move  to  a  more  loosely 
coupled,  distributed  nature.  The  maturity  of  client-server  architectures  and  software,  as  well  as  the  virtual 
explosion  of  web-based  systems,  has  demonstrated  the  enormous  advantages  of  distributed  systems. 
Furthermore,  the  advent  of  successful  middleware  technology  such  as  the  CORBA  2.0  standard  and 
corresponding  vendor  implementations  has  drastically  reduced  barriers  to  communication  and  data  sharing 
among  heterogeneous  software  and  hardware  systems. 

The  interest  in  distributed  geographic  information  systems  (GIS)  is  no  less  than  that  of  general 
information  systems;  however,  the  uniqueness  of  the  nature  of  spatial  data  makes  the  issue  of  true 


interoperability  of  GIS  a  major  research  concern.  Evidence  of  this  is  abundant  in  the  literature,  and  is  also 
illustrated  by  initiatives  such  as  the  Open  Geodata  Interoperability  Specification  (OGIS)  work  by  the  Open 
GIS  Consortium,  Inc.,  as  well  as  University  Consortium  on  Geographic  Information  Science  (UCGIS) 
priority  research  panels  on  ’’Interoperability  of  Geographic  Information”  and  ’’Spatial  Data  Acquisition  and 
Integration." 

To  set  the  context  for  our  following  work  on  conflation,  we  first  define  several  of  the  most  frequently 
used  terms  and  their  interrelationships  within  the  general  scope  of  GIS  interoperability.  These  terms — 
interoperability ,  integration ,  conflation  and  fusion — are  often  used  to  convey  very  different  ideas,  or 
alternatively,  used  so  loosely  as  to  be  somewhat  interchangeable.  Therefore,  clarification  of  the  use  of 
these  terms  in  this  paper  will  be  beneficial.  Table  1  shows  the  3-tier  hierarchy  illustrating  our  use  of  these 
terms. 

At  the  lowest  level  of  the  hierarchy  is  the  concept  of  data  integration.  In  keeping  with  the  most 
widespread  use  of  this  term,  e.g.  (Flowerdew,  1991),  our  use  of  data  integration  is  intended  to  convey  the 
idea  of  some  process  whereby  incompatibilities  among  varying  spatial  data  formats  is  resolved,  allowing 
the  various  data  types  to  be  simultaneously  analyzed/displayed/processed  by  a  GIS.  Data  integration  is 
therefore  a  low-level  transformation  procedure  that  requires  no  semantic  knowledge  of  the  various  data. 
Integration  of  data  types  can  be  considered  within  the  context  of  a  single  GIS,  for  example,  the  integration 
of  vector  and  raster  data  for  display  purposes,  or  as  part  of  a  distributed  system. 

Conflation  is  a  higher-level  concept  than  integration,  because  it  implies  a  deeper  (semantic  and 
“intelligent”)  knowledge  about  the  data.  Conflation  results  in  a  state  of  harmony  among  various  data 
sources  in  which  a  single,  “best”  view  of  multiple  data  representations  for  similar  data  types  is  presented  to 
the  user.  Thus,  conflation  logically  can  occur  only  if  integration  as  defined  earlier  has  already  been 
resolved.  Beyond  conflation,  which  is  viewed  as  an  issue  only  among  similar  types  of  geographic 
information,  e.g.,  vector  with  vector,  the  concept  of  data  fusion  is  the  more  generic  idea  of  combining 
widely  varying  forms  of  data,  e.g.,  multimedia,  in  a  system  that  can  effectively  organize  the  information  in 
a  way  that  is  of  benefit  to  the  user.  This  concept  of  the  “omni-informational”  GIS  is  discussed  in 
(Shepherd,  1991). 

Finally,  “interoperability”  is  viewed  as  the  ultimate  goal,  encompassing  all  aspects  of  representation 
and  semantic  integration  and  providing  a  truly  seamless  view  of  geographic  data  in  all  its  many  forms.  A 
UCGIS  white  paper  (available  at  http://www.ucgis.org/)  notes  several  long-term  goals  related  to 
interoperability,  including  machine-interpreted  semantics  of  geographic  data,  improved  semantic 
representation  for  the  data,  language  support  for  communication  of  geographic  information  and  the 
development  of  canonical  data  models  of  geographic  information. 


Interoperability  < 


V. 


Hierarchy  of  terms 

Examples 

1.  Fusion 

Image  +  Text  +  Video  +  Vector  +  Raster  +  . . . 

2.  Conflation 

Bridge  representation  1  +  Bridge  representation  2 
“Best”  bridge  representation 

3.  Interchange 

Proprietary  vector  format  1  Proprietary  vector  format 

2 

Table  1.  Hierarchy  and  examples  of  terminology. 


We  view  our  work  in  conflation  as  being  most  relevant  to  the  last  of  these  goals,  the  development  of  a 
canonical  data  model.  Though  the  issue  is  not  specifically  addressed  in  this  paper,  our  approach  to 


conflation  using  a  generic  object  model  has  obvious  implications  to  issues  of  semantic  schema  integration 
and  meaningful  data  interchange,  as  well  as  other  research  concerns  in  this  area. 

The  following  section  gives  pertinent  background  information  on  conflation  research  in  general  and 
within  the  context  of  our  work.  Following  that  is  a  discussion  of  aspects  of  uncertainty  as  related  to  feature 
matching  in  a  distributed  environment.  The  object  conflation  model  is  then  presented  in  its  general  form, 
together  with  an  example  that  is  used  to  illustrate  specific  matching  capabilities  utilizing  previously 
developed  techniques  for  reasoning  under  uncertainty.  Our  summary  and  plans  for  future  work  are  then 
given. 


BACKGROUND  AND  RELATED  WORK 


Conflation 

Conflation  is  typically  regarded  as  the  combination  of  information  from  two  digital  maps  to  produce  a 
third  map  that  is  better  than  either  of  its  component  sources.  The  history  of  map  conflation  goes  back  to  the 
early  to  mid- 1980's.  The  first  clear  development  and  application  of  an  automated  conflation  process 
occurred  during  a  joint  United  States  Geological  Survey  (USGS)-Bureau  of  the  Census  project  designed  to 
consolidate  the  agencies'  respective  digital  map  files  of  U.S.  metropolitan  areas  (Saalfeld,  1988).  The 
implementation  of  a  computerized  system  for  this  task  provided  an  essential  foundation  for  much  of  the 
theory  and  many  of  the  techniques  used  today.  Since  that  time,  others,  including  commercial  GIS  vendors, 
have  implemented  conflation  tools  within  their  applications.  For  an  example  of  commercial  work  on 
conflation,  see  (Siegel,  1995). 

Conflation  can,  in  brief,  be  viewed  as  a  multi-step,  iterative  process  that  involves  feature  matching, 
positional  re-alignment  of  component  maps  and  attribute  deconfliction  of  positively  identified  feature 
matches.  Feature  matching,  simply  and  perhaps  somewhat  obviously  stated,  involves  the  identification  of 
features  from  different  maps  as  being  representations  of  the  same  geographic  entity.  Positional  alignment 
is  a  mathematical  procedure  in  which  previously  identified  matching  features  are  brought  into  spatial 
agreement,  while  deconfliction  is  a  process  in  which  contradictions  in  a  matching  pair’s  attributes  and/or 
values  are  resolved.  Positional  alignment  and  deconfliction  are  both  steps  that  are  performed  after  a 
positive  match  has  been  determined.  As  such,  it  is  easy  to  see  that  accurate  feature  matching  results  are 
essential  to  the  overall  quality  of  the  resulting  conflated  map.  Because  of  this  dependency,  our  work  thus 
far  has  concentrated  solely  on  feature  matching  aspects  of  conflation.  Our  motivation  for  this  work 
includes  an  effort  to  make  individual  geographic  features  "intelligent"  enough  to  know  when  and  how  to 
conflate  themselves  in  a  distributed  environment. 


Original  Conflation  Project 

Our  original  work  on  conflation  (Cobb  et  al.,  1998a)  was  performed  within  the  context  of  the  Digital 
Mapping,  Charting  &  Geodesy  Program  (DMAP)  at  the  Naval  Research  Laboratory  (NRL),  Stennis  Space 
Center,  Mississippi.  DMAP  was  established  in  the  1980’s  as  an  avenue  for  providing  comprehensive 
reviews  of  the  National  Imagery  and  Mapping  Agency’s  (NIMA)  newly  developed  digital  mapping 
products.  DMAP  reviews  of  these  products  in  the  mid-1990’s  led  to  the  recommendation  of  an  object- 
oriented  (OO)  data  model.  Subsequently,  DMAP  received  funds  to  develop  an  00  prototype  for  various 
NIMA  products.  This  original  prototype,  Object  Vector  Product  Format  (OVPF),  has  since  been 
significantly  expanded  and  is  now  capable  of  processing  various  other  vector  data  types,  as  well  as  raster 
and  text  data.  The  implementation  also  includes  an  00  database  management  system  (ODBMS)  integrated 
with  an  area-of-interest,  web-based  query  model.  Critical  stages  of  the  system  development  are 
documented  in  (Arctur  et  al.,  1995;  Shaw  et  al.,  1996;  Shaw,  Chung  and  Cobb,  1998;  and  Cobb  et  al., 
1998b). 

(Cobb  et  al.,  1998a)  represents  the  first  time  that  methods  for  reasoning  under  uncertainty  have  been 
utilized  in  a  conflation  process.  The  intent  was  to  design  a  system  that  would  mimic  the  reasoning  process 


of  a  human  expert  in  conflation.  A  proof-of-concept  system  using  NRL’s  previously  developed  00 
prototype  for  spatial  data  (now  known  as  the  Geospatial  Information  Database,  or  GIDB),  in  conjunction 
with  the  Nexpert  commercial  expert  system  by  Neuron  Data,  Inc.,  was  implemented  to  demonstrate  the 
effectiveness  of  the  techniques  for  portions  of  attribute  and  shape  similarity  matching.  More  details  of  this 
system  can  be  found  in  (Foley,  1997a-b). 

The  scope  of  the  original  project  was  NIMA's  Vector  Product  Format  (VPF)  data  (DMA,  1993).  The 
VPF  is  a  vector  data  standard  that  describes  a  general  hierarchical  model  for  storage  and  interchange  of 
attributed  vector  data.  Levels  of  the  hierarchy  include,  top  to  bottom:  database ,  which  contains  multiple 
libraries ,  each  of  which  contains  multiple  coverages ,  each  with  multiple  feature  classes ,  each  containing 
the  actual  geographic  feature  data.  Within  the  VPF  guidelines,  various  product  types  exist  that  adhere  to 
the  VPF  standard,  and  add  further  restrictions  to  data  content  and  format.  A  VPF  database  is  defined  by  the 
product  standard  and  a  spatial  extent;  a  library  is  defined  by  extent  and  scale,  while  a  coverage  is  defined 
by  thematically  and  topologically  related  features.  Examples  of  VPF  products  include  Digital  Nautical 
Chart  (DNC),  Vector  Smart  Map  (VMAP)  and  World  Vector  Shoreline  Plus  (WVS+).  Each  VPF  product 
is  designed  to  serve  a  particular  user  need. 

The  VPF  standard  provided  a  unifying  framework  for  the  conflation  effort.  VPF  utilizes  geospatial 
data  standards  such  as  the  Feature  and  Attribute  Coding  Catalog  (FACC)  of  the  DIGEST  specification 
(DGIWG,  1994).  The  FACC  provides  standard  feature  names  for  geospatial  features,  as  well  as  attribute 
codes  and  encoded  and  non-encoded  representations  for  attribute  values.  Commonalities  such  as  this 
among  the  various  product  types  allowed  us  to  develop  a  conflation  model  that  was  valid  for  all  databases 
that  were  VPF-compliant.  Furthermore,  though  the  implementation  was  specific  to  VPF,  the  principles  of 
the  model  are  generally  applicable  to  other  representations  of  attributed  vector  data. 

As  mentioned  earlier,  the  focus  of  all  our  conflation  research  thus  far  has  been  feature  matching,  due  to 
the  dependence  of  other  phases  of  conflation  on  correctly  obtained  results  for  matched  features.  In  this 
paper,  we  extend  the  previously  developed  conflation  model  by  developing  a  more  general  object-oriented 
approach  and  considering  the  model  in  a  distributed  environment. 


UNCERTAINTY  IN  A  DISTRIBUTED  ENVIRONMENT 
Uncertainty  and  Feature  Matching 

The  assessment  of  feature  match  criteria  is  a  process  in  which  evidence  must  be  evaluated  and  weighed 
and  a  conclusion  drawn — not  one  in  which  equivalence  can  be  unambiguously  determined.  For  example, 
fuzzy  concepts  such  as  '’closeness"  of  two  features  and  "similarity"  of  attributes  and  feature  groupings  are 
essential  for  determining  equivalence. 

In  fact,  feature  matching  can  be  considered  as  a  type  of  classification  problem.  That  is,  we  are  trying  to 
determine  whether  one  feature  belongs  to  the  same  "class"  as  another;  in  this  case  the  class  is  defined  by  a 
set  of  two  features  that  are  believed  to  be  representations  of  the  same  real-world  entity.  This  type  of 
problem  can  be  handled  through  theories  of  evidential  reasoning  or  uncertainty,  such  as  fuzzy  logic  (Zadeh, 
1965)  or  Dempster-Shafer  theory  (Shafer,  1976).  These  theories  attempt  to  provide  likelihood  measures  for 
questions  based  on  available,  though  not  necessarily  conclusive,  evidence.  For  example,  in  feature 
matching,  the  question  we  must  consider  is,  "Based  on  the  available  evidence,  what  is  the  likelihood  (or 
probability)  that  feature  A  from  map  coverage  1  represents  the  same  entity  as  feature  B  from  map  coverage 
2?" 


Our  approach  to  feature  matching  draws  from  aspects  of  both  fuzzy  set  logic  and  evidential  reasoning. 
In  particular,  the  assignment  of  matching  scores  for  linguistic  attributes  is  directly  motivated  by  work  in 
modeling  linguistic  variables  by  the  use  of  fuzzy  sets.  For  example,  we'need  to  be  able  to  determine  the 
semantic  similarity  of  an  attribute  such  as  road  surface  type  which  may  have  a  value  of  ‘hard’  for  one 
feature  and  ‘asphalt’  for  the  potential  matching  feature.  Obviously,  the  semantics  of  the  two  are  not  strictly 
equivalent,  but  neither  are  they  contradictory.  Likewise,  the  idea  of  combining  scores  from  the  different 


components  of  feature  matching  to  arrive  at  a  single  matching  score  is  very  similar  to  techniques  for  the 
combination  of  evidence  used  in  evidential  reasoning. 


Uncertainty  and  Conflation  for  Distributed  Data 

Obviously,  issues  of  uncertainty  that  apply  to  conflation  within  a  single  system — such  as  those  for 
feature  matching — are  also  applicable  to  conflation  in  a  distributed  environment.  However,  we  believe 
additional  factors  related  to  the  general  topic  of  distributed  databases  increase  the  scope  of  uncertainty  that 
must  be  considered  in  this  context.  As  background,  we  can  draw  from  the  abundance  of  past  and  ongoing 
research  in  the  realm  of  schema  merging  for  conventional  (i.e.,  relational)  distributed  heterogeneous 
databases.  An  example  of  work  in  this  area  includes  (Lim,  1996). 

The  general  concept  of  schema  merging  involves  resolution  of  incompatibilities  in  metadata.  These 
incompatibilities  may  be  either  structural  or  semantic  in  nature.  Structural  incompatibilities  involve  those, 
for  example,  in  which  attributes  for  representing  the  same  values  are  defined  differently.  These  may 
include  different  names  for  the  attributes,  or  different  domains  for  their  associated  values,  e.g.,  float  vs. 
integer.  Semantic  incompatibilities,  on  the  other  hand,  represent  those  cases  in  which  similarly  defined 
attributes  have  different  meanings  or  values.  For  example,  an  attribute  of  width  for  a  road  in  one  database 
may  include  the  width  of  the  road  plus  any  associated  right-of-ways,  while  the  same  attribute  name  in 
another  database  may  only  imply  the  width  of  the  paved/driveable  portion  of  the  road.  Semantic 
incompatibilities  are  much  more  difficult  to  handle  automatically,  as  they  necessarily  imply  a  deeper 
understanding  of  the  data. 

It  is  clear  that  these  issues  are  very  similar  to  ones  that  must  be  faced  in  performing  conflation  in 
distributed  spatial  databases.  In  particular,  semantic  schema  integration  and  the  feature-matching  phase  of 
conflation  require  similar  levels  of  knowledge  regarding  the  meanings  that  various  data  are  intended  to 
convey.  Semantic  knowledge  is  inherently  uncertain,  as  interpretations  of  even  the  most  seemingly 
unambiguous  words  and  phrases  vary  among  individuals.  Similarly,  structural  differences  in  spatial  data 
representation  for  like  features  are  to  be  expected  in  any  distributed  system  comprised  of  heterogeneous 
data  sources.  It  is  evident  from  this  discussion  that  conflation  in  a  distributed  environment  can  be  viewed 
as  a  specific  application  of  issues  related  to  uncertainty  in  schema  merging. 


CONFLATION  MODEL 

As  defined  earlier  in  this  paper,  conflation  is  considered  to  be  the  process  of  combining  information 
from  two  map  sources  to  produce  a  better  map.  This  definition,  however,  arises  from  the  historical 
perspective  of  conflation  in  which  a  cartographer  manually  combines  information  from  multiple  paper 
maps.  In  the  digital  arena,  in  which  data  integration  can  virtually  eliminate  the  concept  of  a  standalone 
map,  one  of  the  first  issues  to  be  resolved  for  the  development  of  a  general  conflation  model  is,  “What 
constitutes  a  map  source?” 

From  the  perspective  of  the  VPF  standard,  as  well  as  similar  vector  data  models,  there  are  several 
obvious  choices.  Each  of  the  levels  in  the  VPF  database  hierarchy — database,  library,  coverage  and  feature 
class — could  be  considered  as  a  candidate  object  for  the  role  of  defining  a  map  source.  However,  to 
impose  such  a  definition  on  a  predefined  subset  of  data,  however  seemingly  natural  a  fit,  would  hinder  our 
efforts  in  progressing  to  the  ultimate  goal  of  truly  transparent  spatial  data  integration  and  interoperability. 
Furthermore,  conflation  decisions  made  at  any  of  these  levels  would  have  a  possibly  negative  impact  on 
fitness  for  use  of  the  end  result,  by  the  inclusion  or  exclusion  of  data  based  on  high-level  properties.  This  is 
obviously  undesirable,  as  fitness  for  use  is  an  extremely  significant  qualitative  measure  of  conflation 
success. 

Definition  of  a  map  source  is  more  generally  related  to  the  question  of  when  to  perform  conflation. 

The  following  list  gives  examples  of  answers  to  this  question: 

•  Automatically,  whenever  a  user  requests  data 


•  Whenever  data  from  two  different  overlapping  databases,  libraries,  coverages,  etc.,  are  retrieved 

•  Only  when  explicitly  requested  by  the  user 

The  third  possibility  is  obviously  error-prone,  as  it  assumes  the  user  knows  the  cases  in  which  conflation  is 
an  issue.  The  second  preserves  artificial  categorizations  of  data  that,  as  mentioned  earlier,  are  barriers  to 
long-term  goals  of  data  independence,  integration  and  interoperability.  Therefore,  we  have  chosen  the 
automatic  system  model  represented  by  the  first  answer.  In  support  of  this,  our  concept  of  a  map  source  is 
no  longer  some  collection  or  sub-collection  of  geographical  data;  rather  each  individual  feature  is  analyzed 
separately  by  the  conflation  system.  More  discussion  on  the  implications  of  this  follows  in  the  presentation 
of  the  conflation  model. 

A  second  general  consideration  to  be  made  is  the  impact  of  data  distribution  on  a  general  conflation 
model.  The  issue  in  this  case  is  whether  to  tailor  the  model  to  fit  a  particular  distributed  environment,  or  to 
design  a  “one-size-fits-all”  model  that  effectively  eliminates  distribution  as  an  issue.  Although  thorough 
consideration  of  partitioning  schemes  for  spatial  data  is  crucial  for  optimal  performance  in  a  given  system, 
we  have  chosen  to  develop  a  conflation  model  at  the  logical  level.  Hence,  in  keeping  with  generally 
accepted  principles  of  distributed  (as  well  as  non-distributed)  database  design,  this  model  is  completely 
independent  of  physical  concerns  such  as  actual  data  partitioning.  However,  the  general  issue  of 
considering  conflation  within  a  distributed  system  versus  a  standalone  system  does  impact  design 
decisions,  even  at  an  abstract  logical  level.  The  primary  issue  that  we  consider  is  centralization  versus 
distributed  control  of  conflation  events.  A  more  in-depth  discussion  of  this  is  given  in  conjunction  with  the 
presentation  of  the  model. 


Introduction  to  the  Model 

The  presentation  of  our  logical  design  is  based  on  an  00  model.  The  00  paradigm  is  well  accepted  as 
the  prevailing  method  for  the  representation  and  manipulation  of  complex  data  such  as  geographical 
information.  Within  an  00  framework,  one  is  able  to  define  models  of  real-world  data  in  ways  analogous 
to  those  in  which  we  intuitively  perceive  and  interpret  those  data.  As  a  general  introduction  to  the  subject, 
an  object  is  a  collection  of  data  (state)  and  methods  (behavior)  which  represents  the  properties  and 
processes  of  a  real-world  entity,  such  as  the  Chesapeake  Bay  Bridge  or  the  Mississippi  River.  A  class  is  a 
template  for  creating  new  objects  that  share  common  properties.  For  example,  there  may  be  a  bridge  class 
that  captures  generic  information  that  every  instance  of  that  class  (an  instantiation)  should  contain.  This 
would  most  likely  include  data  such  as  length,  height,  maximum  weight  limit,  and  type  (drawbridge, 
suspension,  etc.).  Examples  of  procedures  for  a  bridge  class  could  include  opening  and  closing. 

The  packaging  of  an  object’s  data  with  its  procedures  is  known  as  encapsulation.  Encapsulation 
allows  modifications/additions  to  the  system  with  minimal  impact  on  other  system  components.  This 
property  is  crucial  for  the  successful  development  and  maintenance  of  complex  software  systems  such  as 
GIS.  Other  major  concepts  for  understanding  00  models  include  inheritance  and  polymorphism. 
Inheritance  is  the  automatic  inclusion  of  attributes  and  methods  from  classes  defined  at  a  higher  level  in  a 
class  hierarchy.  The  class  hierarchy  is  designed  with  more  general  classes  being  placed  at  higher  levels  and 
more  specific  classes  being  placed  at  lower  levels.  Inheritance  reduces  the  need  to  duplicate  data  and  code, 
while  the  structuring  of  a  class  hierarchy  provides  a  logical  organization  of  object  "types.”  Polymorphism 
allows  methods  for  different  objects  to  have  the  same  name,  while  providing  different  implementations. 

For  example,  both  a  circle  and  a  square  class  could  implement  a  method  that  calculates  area.  Both  methods 
could  be  named  area,  but  the  implementations  would  use  the  appropriate  formula  for  either  a  circle  or  a 
square.  Methods  that  invoked  the  area  method  for  an  object  would  use  one  name-the  class  of  the 
receiving  object  would  determine  which  implementation  to  use.  Polymorphism  helps  to  simplify  system 
design,  and  reduces  coding  complexity  by  eliminating  error-prone  if-then-else  type-checking  constructs. 

These  concepts  are  applied  to  our  work  in  the  following  ways.  Encapsulation  allows  each  geographic 
feature  to  maintain  its  own  state  of  knowledge  related  to  information  and  procedures  necessary  for 
conflation  with  another  feature.  Changes  in  the  algorithms/implementations  of  a  particular  feature’s 
conflation  abilities  do  not  affect  other  features.  Inheritance  allows  us  to  describe  general  types  of 
conflation  knowledge  applicable  to  multiple  feature  classes  within  a  single  class  at  a  high  level  in  the 


hierarchy.  Lower-level  (more  specific)  classes  then  automatically  inherit  this  knowledge.  Finally, 
polymorphism  allows  us  to  implement  general  conflation  procedures  that  are  valid  for  instances  of  any 
feature  class;  thus,  polymorphism  greatly  reduces  the  need  to  be  concerned  with  the  issue  of  "type.”  The 
result  is  that  we  are  able  to  treat,  for  example,  railroad  features  and  building  features  in  the  same  manner  at 
a  logical  level.  Figure  1  shows  a  simplified  class  hierarchy  for  conflation. 


Figure  1.  General  class  hierarchy  for  conflation. 


Discussions  and  presentations  at  a  recent  assembly  of  the  University  Consortium  of  Geographic 
Information  Science  (UCGIS)  lend  credence  to  the  primary  points  in  this  research.  (White  papers  of  the 
assembly  are  available  at  http://www.ucgis.org/.)  First,  the  priority  research  panel  on  "Extensions  to 
Geographic  Representation"  recommends  the  use  of  object-oriented  techniques  for  improved 
representational  power  of  geographic  data.  Second,  the  priority  research  panel  on  "Spatial  Data 
Acquisition  and  Integration"  states  in  their  white  paper  that  a  "general  theoretical  and  conceptual 
framework"  for  conflation  is  needed,  and  that  furthermore,  this  framework  should  allow  for  matches  of 
limited  confidence  (uncertainty). 


Conflation  Process  Overview 

The  NRL  GIDB  prototype,  within  which  proof-of-concept  implementation  of  this  model  is  currently 
being  performed,  is  centered  on  the  concept  of  spatial  range  queries,  also  known  as  area-of-interest  (AOI) 
queries.  Given  an  AOI,  through  definition  of  manual  or  graphical  bounding  box  coordinates  or 
geographical  place  name,  the  GIDB  is  able  to  return  any  vector,  raster  or  multimedia  data  available  for  that 
area.  Advanced  queries,  such  as  those  limiting  vector  data  attribute  values,  are  also  available. 

Our  conceptual  model  of  the  conflation  process  is  based  on  this  system  model  of  AOI  queries,  limited 
of  course  to  the  consideration  of  vector  data  only.  A  diagram  of  the  conflation  process  is  shown  in  figure  2. 
The  primary  point  to  note  is  that  the  process  takes  place  at  the  individual  feature  level,  irrespective  of  that 
feature’s  physical  residence,  or  logical  database,  library  or  coverage  inclusion.  In  the  first  step,  the  user 
selects  an  AOI.  The  query  manager  then  retrieves  all  feature  objects  from  the  distributed  database  that  fall 
within  the  AOI  (subject  to  any  other  constraints  imposed  by  the  user).  From  this  collection  of  objects,  the 
query  manager  randomly  selects  one  object  to  send  a  “conflate”  message.  That  object  follows  the  protocol 
for  determining  which,  if  any,  of  the  remaining  objects  in  the  query  collection  are  matching  representations 
for  itself.  Any  object  determined  to  be  a  match  is  placed  in  the  matching  feature  set  for  the  conflating 
object,  ranked  according  to  similarity  scores.  Objects  that  have  been  placed  in  a  matching  feature  set  are 
removed  from  the  general  query  collection,  so  as  not  to  be  candidates  for  subsequent  conflation  iterations. 
This  philosophy  implies  that  only  those  objects  that  have  scores  strongly  suggesting  matching  features  are 
placed  in  the  matching  feature  sets.  The  query  manager  continues  by  sending  “conflate”  messages  to  the 
remaining  members  of  the  query  collection.  When  the  process  is  complete,  the  query  manager  returns  the 


query  collection  for  display.  For  any  features  with  non-empty  matching  feature  sets,  the  member  of  the  set 
with  the  highest  quality  score  (NOT  the  same  as  the  similarity  score)  is  returned  as  the  representative  for 
that  feature. 
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Figure  2.  Conceptual  model  for  distributed  conflation. 


Several  points  on  the  conceptual  model  are  worth  noting  before  we  proceed  with  the  details.  First,  the 
responsibility  for  conflation  is  distributed.  Each  individual  feature  contains  the  inherited  knowledge 
needed  for  performing  feature  matching  for  that  particular  class  of  geographic  objects.  This  is  preferable  to 
a  centralized  conflation  system,  where  a  single  conflation  object  “manages”  the  process.  Such  a  system 
would  be  more  difficult  both  to  implement  and  maintain  due  to  differences  in  the  ways  in  which  objects 
from  various  feature  classes  should  be  matched,  as  well  as  the  ramifications  of  adding  new  features  and/or 
system  nodes  for  a  distributed  system.  Second,  the  process  is  automatic.  Because  the  query  manager 
collects  the  features  satisfying  the  user  query  and  invokes  the  conflation  method  for  each  object  before 
returning  the  final  set,  the  user  does  not  have  to  explicitly  request  conflation,  or  even  need  to  know  that 
conflation  is  an  issue.  Third,  when  the  results  are  returned  to  the  user,  only  a  single  representation  of  each 
object  is  presented.  For  now,  our  treatment  of  deconfliction  is  simply  to  select  the  “best”  feature  from  a 
matching  feature  set,  based  on  various  objective  and  subjective  criteria.  This  idea  of  simplifying  the  user’s 
concern  and  involvement  with  conflation  is  critical  for  end-user  based  systems,  as  the  whole  need  for  such 
an  automated  approach  is  derived  from  the  concept  of  non-expert  users.  Of  course,  this  model  could  easily 
be  extended  to  allow  the  user  more  involvement  when  desired. 


The  Spatial  Conflation  Object 

The  spatial  conflation  object  (SCO)  is  shown  in  figure  1  as  the  top  of  the  conflation  hierarchy.  As 
such,  the  SCO  is,  in  00  terms,  an  abstract  superclass.  Abstract  superclasses  do  not  have  concrete  instances 
associated  with  them;  rather,  they  provide  a  mechanism  for  the  inheritance  of  generic  traits  that  apply  to 
each  of  the  subclasses.  The  SCO,  therefore,  provides  general  information  needed  for  any  type  of  feature 
conflation,  as  well  as  default  actions  to  take  during  the  process.  This  knowledge  is  not  expected  to  be 
complete  enough  to  allow  for  a  well-defined  conflation  procedure.  Instead,  this  knowledge  is  augmented 
with  specific  types  of  knowledge  for  feature  class  specializations.  For  example,  the  way  in  which 
geometric  similarity  is  determined  for  line  features  vs.  area  features  is  different.  The  common  trait  is  that 
the  general  idea  of  geometric  similarity  is  a  significant  factor  in  performing  conflation.  To  go  one  level 
further,  the  way  in  which  geometric  similarity  is  assessed  for  railroad  lines  vs.  river  lines  is  different, 
though  the  method  for  computing  the  measurement  is  the  same,  since  both  are  line  features.  To  summarize, 
the  general  knowledge  for  conflation  is  inherited  through  the  class  hierarchy;  at  each  level,  additional,  more 
concrete  knowledge  is  added  until  there  is  sufficient  for  successfully  performing  conflation. 

Figure  3  illustrates  the  set  of  attributes  and  methods  for  the  SCO.  The  attributes  are  the  top,  italicized 
set,  while  the  methods  are  the  lower,  non-itaiicized  set.  This  is  not  a  complete  set,  but  it  is  sufficient  for 


explanatory  purposes.  Two  types  of  attributes  are  given,  instance  and  class.  These  differ  in  that  for 
instance  attributes,  each  object  instantiation  of  feature  subclasses  in  the  SCO  hierarchy  inherits  this  set  of 
attributes,  with  each  object  having  different  values  for  each  attribute.  For  class  variables,  there  is  one 
value,  which  is  accessible  to  all  instantiations  of  that  class  and  its  subclasses.  Likewise,  the  methods  are 
also  inherited,  although  it  is  likely  that  these  will  often  be  overridden  by  feature  class-specific  methods. 

The  ones  defined  at  this  level  may  be  used  as  defaults,  if  no  further  information  is  available. 

The  rank  attribute  represents  a  measure  of  quality  of  the  information  for  a  particular  feature.  It  is  a 
subjective  measure  based  on  quality  information  provided  in  the  features’  database  and  library,  as  well  as 
scale  and  fitness  for  use  as  determined  by  the  user’s  objectives.  Determination  of  this  value  is  one  of  the 
major  places  in  the  model  in  which  uncertainty  must  be  taken  into  account.  The  matchingSet  contains  all 
features  that  were  determined  to  be  matching  features  for  the  object  beyond  a  user-defined  or  default 
threshold .  The  RuleSet  is  a  composite  object  comprised  of  rules  and  rule-processing  strategies  for 
measuring  similarities  in  distance,  geometry,  attributes,  topology,  FACC  feature  codes,  and  other 
miscellaneous  criteria  for  determining  feature  matches.  Obviously,  this  is  another  area  in  which  techniques 
for  reasoning  under  uncertainty,  as  well  as  techniques  for  subjectively  evaluating  and  analyzing  vague, 
qualitative  information  must  be  used.  The  role  of  SimilarityTableSet  is  explained  in  a  subsequent  section. 

As  the  method  names  are  somewhat  self-explanatory,  we  will  only  expound  here  on  the  way  in  which 
these  are  used  in  the  conflation  process.  The  conflate  method  is  invoked  on  each  object  by  the  query 
manager,  and  sets  in  motion  a  sequence  of  actions.  We  assume  for  now,  for  the  sake  of  simplicity,  that 
each  object’s  rank  has  been  predetermined.  The  object  then  begins  the  process  to  determine  if  any  of  the 
other  objects  in  the  query  set  are  potential  matches  (findCandidates).  This  process  is  one  of  filtering  and 
refining.  That  is,  simple  measures  that  can  quickly  eliminate  non-matches  are  used  first,  e.g.,  completely 
different  feature  codes,  followed  successively  by  more  subtle  checks  such  as  attribute  set  and  value 
similarity.  Once  a  candidate  has  been  evaluated,  it  is  determined  whether  the  overall  matchingScore,  as 
determined  by  a  function  combining  attributeScore ,  geometry  Score,  topologyScore  and  filterScore , 
exceeds  the  threshold  level.  Those  that  do  are  placed  into  the  matchingSet.  The  best  of  these  features,  as 
determined  by  the  rank,  is  returned  to  the  query  manager  as  the  representative  for  that  feature 
(selectFeature). 


Spatial  Conflation  Object 
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Figure  3.  Attributes  and  methods  for  the  SCO. 

INTEGRATION  OF  MONOLITHIC  CONFLATION  MODEL 

Previously,  we  have  developed  attribute  and  shape  similarity  measures  for  conflation  based  on 
principles  of  flizzy  logic  and  evidential  reasoning  (Foley,  1997a-b;  Cobb,  et  al.,  1998a).  The 
implementation  of  the  resulting  methodology  involved  a  three-tier  architecture  consisting  of  a  Smalltalk 
component  for  the  user  and  00  database  interface,  a  commercial  expert  system  for  rule  processing  and  a  C 


language  interface  between  the  two.  This  setup  worked  well  for  a  proof-of-concept  implementation  of  a 
single  system/database;  however,  with  the  current  emphasis  on  distributed  spatial  databases,  it  is  obvious 
that  this  initial  architecture  is  too  cumbersome  to  extend  to  a  distributed  realm.  Therefore,  in  this  section 
we  show  how  the  aforementioned  similarity  measures  can  be  migrated  to  and  utilized  in  the  distributed 
object  conflation  model  given  in  this  paper. 


Attribute  Matching  Algorithm 

For  attribute  matching,  each  feature  object  is  considered  as  a  set  of  attribute-value  pairs: 

((an,V|i),  (a12,vI2),...,(aln,vln)) 

((a2i,v2l),  (a22,v22),...,(a2m,v2m)) 

We  consider  matching  for  the  two  categories  of  numeric  and  linguistic  attribute  domains.  In  general, 
matching  for  numeric  domains  is  handled  through  the  use  of  membership  matching  functions,  while 
matching  for  linguistic  domains  is  handled  through  the  use  of  attribute  similarity  tables.  For  the  purpose  of 
example,  we  will  consider  here  only  the  linguistic  domain. 

A  similarity  table  for  a  specific  attribute  contains  a  value  in  the  range  [0,1]  for  each  attribute  domain 
value.  Each  of  these  values  represents  a  degree  of  matching  between  two  attribute  values.  In  many  cases, 
the  domain  values  are  integers  that  represent  encodings  of  linguistic  characteristics;  thus,  the  similarity 
values  in  the  table  represent  similarity  between  linguistic  terms.  Matching  for  features  based  upon  attribute 
similarity  is  a  two-phase  process.  First,  the  similarity  between  each  of  the  attribute  values  for  the  two 
features  is  determined  from  a  similarity  table.  Second,  measures  of  semantic  interrelationships  between 
and  among  the  various  attribute  values  are  computed  within  a  rule-based  expert  system  environment. 

Based  on  these  interrelationships,  the  expert  system  returns  one  or  more  weights  for  increasing  or 
decreasing  the  matching  score  for  various  attributes. 

As  an  example,  consider  a  railroad  feature  with  an  attribute  RRA,  representing  the  railroad  power 
source,  such  that  the  attribute  can  have  the  following  values. 

0  Unknown  4  Non-electrified 

1  Electrified  Track  999  Other 

3  Overhead  Electrified 

The  similarity  table  for  RRA  is  given  in  table  2. 


RRA 

0 

1 

3 

4 

999 

0 

.2 

1 

.2 

1 

3 

.2 

.6 

1 

4 

.2 

.1 

.1 

1 

999 

.2 

.2 

.2 

.2 

.2 

Table  2.  Similarity  table  for  railroad  power  source. 


An  example  of  a  production  rule  for  two  railroad  features,  RR1  and  RR2,  is: 

IF  ((RRl.ltn  =  3  and  RR2.1tn  =  2)  and  (RRl.rrc  =  16  and  RR2.rrc  =  16)) 

THEN  wm  <-  1 .0  and  wUn  <-0.5 

where  Itn  represents  the  number  of  tracks  and  rrc  represents  railroad  categories.  The  overall  matching  score  for 
attributes  is  given  by: 


MSg  =  (Iic=u\  [simAk(Fj,Fj)  x  ESWAk])/N 


where  Ak  is  the  k*  attribute  in  both  Fj  and  Fj;  N  is  the  number  of  attributes  that  are  common  to  both  F(  and  Fj,  and 
ESW  is  the  weight  computed  by  the  expert  system.  Explanations  of  the  derivation  of  similarity  values  and  semantic 
production  rules  would  require  more  depth  than  space  will  permit  here;  the  reader  is  thus  referred  to  the  previously 
mentioned  references  on  this  work  for  greater  details. 


Attribute  Matching  and  the  Distributed  Model 

We  begin  by  summarizing  the  contents  of  the  previous  section  for  illustrating  the  application  of  the  distributed 
model  to  this  work.  The  matching  of  attributes  for  linguistic-based  value  domains  involves:  (1)  a  similarity  table 
for  determining  matching  between  two  attribute  values;  (2)  production  rules  for  considering  semantic  implications 
of  two  or  more  attribute  values;  (3)  weights  generated  by  these  rules,  which  are  combined  with  previously 
determined  matching  scores;  and  (4)  an  attribute  matching  score  computed  over  the  set  of  common  attributes  for 
two  features. 

Beginning  with  the  first  component  in  this  list,  it  is  relatively  easy  to  see  that  the  idea  of  similarity  tables  for 
linguistic  attribute  matching  applies  across  all  feature  classes  and  representations.  Therefore,  an  attribute  for  this  is 
explicitly  provided  in  the  SCO  (see  figure  3).  The  SimilarityTableSet  attribute  is  shown  as  a  class  variable.  This 
variable  is  inherited  by  all  feature  classes,  e.g.,  bridges,  roads,  etc.,  each  of  which  has  a  unique  set  of  tables,  one  per 
attribute  of  that  class,  that  contains  values  representative  of  the  attribute  value  domain.  For  example,  the  similarity 
table  for  RRA  shown  as  table  2  would  be  a  member  of  the  SimilarityTableSet  for  a  railroad  feature  class.  Because 
it  is  a  class  variable,  only  one  copy  is  maintained  that  is  accessible  to  all  instances  of  that  class. 

The  incorporation  of  production  rules  into  the  model  is  a  complex  issue,  of  which  only  the  very  basics  will  be 
considered  here  for  illustrative  purposes.  As  a  glimpse  into  the  nature  of  this  complexity,  we  note  that  further 
refinement  of  the  SCO  model  involves  the  modeling  of  a  rule-based  production  system  as  an  object  that  includes  all 
the  behavior  necessary  for  processing  rules  of  varying  formats  and  combinations,  as  well  as  including  all  the  data 
needed  by  the  rules.  As  such,  production  rules  obviously  are  one  aspect  of  the  model  that  are  again  applicable 
across  feature  classes  and  representations.  The  generic  model,  or  template,  for  production  rules  is  thus  applicable  at 
the  SCO  level.  However,  specific  instantiations  of  rules  must  incorporate  data  and  knowledge  from  lower  levels, 
including  both  the  feature  class  level  (this  is  illustrated  in  the  rule  example  given  previously  for  railroad  attributes), 
as  well  as  the  second  level  of  the  hierarchy  shown  in  figure  1  for  point,  line  and  area  classes. 

The  weights  generated  by  the  production  rule  system  are  intermediate  data,  and  are  thus  considered  as  part  of 
the  object  model  for  that  system.  No  knowledge  of  the  values  of  these  weights  is  required  of  the  individual  features 
involved  in  conflation.  The  attribute  matching  score,  as  well  as  matching  scores  for  the  various  other  categories  of 
matching  criteria,  are  modeled  as  instance  variables  that  are  inherited  by  the  feature  classes  (figure  3).  Each 
geographic  feature  maintains  knowledge  of  these  intermediate  scores  to  compute  a  final  matching  score 
( matchingScore ).  The  intermediate  scores  are  maintained  so  that  knowledgeable  users  can  view  the  results  as  part 
of  a  possible  conflation  verification  procedure.  The  equation  given  for  computing  the  attribute  matching  score  is 
valid  across  feature  classes  and  thus  belongs  in  the  SCO.  However,  knowledge  for  interpreting  the  results  is 
conceivably  different  across  feature  classes,  and  thus  is  maintained  at  the  lower  level  of  the  model  hierarchy. 


Allocation  Issues  and  the  Distributed  Model 

Though  irrelevant  at  a  conceptual  level,  physical  allocation  of  data  in  a  distributed  system  is  of  paramount 
concern  for  performance  issues.  Here,  we  briefly  consider  one  possible  allocation  scheme  and  the  potential  impact 
as  related  to  the  SCO  model. 

The  scheme  we  consider  is  a  partitioning  based  on  representational  classes.  That  is,  all  line  features  such  as 
railroad  lines,  power  lines,  etc.,  reside  on  the  same  physical  node;  all  point  features  are  likewise  allocated  to  the 
same  node,  as  are  all  area  features.  In  consideration  of  the  hierarchy  given  in  figure  1  within  this  setup,  we  believe 
the  optimal  partitioning  of  the  hierarchy  is  to  include  all  level  3  (feature  class)  classes  and  their  instantiations 
together  with  their  corresponding  level  2  superclass.  For  example,  all  transportation  lines,  utility  lines,  etc.  would  be 
co-resident  with  the  line  class  object.  For  conflation,  this  setup  would  minimize  network  traffic  needed  for  transfer 
of  data  and  execution  of  methods,  as  most  of  the  specific  conflation  knowledge  is  inherited  from  the  SCO  and  values 


instantiated  at  these  lower  two  levels.  Of  course,  other  partitionings  may  also  provide  acceptable  performance,  but 
this  one  is  given  as  an  example  of  considerations  between  data  allocation  methods  and  the  distributed  conflation 
model. 


SUMMARY  AND  FUTURE  WORK 

The  conflation  model  presented  in  this  paper,  together  with  the  discussion  on  attribute  matching,  illustrates  how 
the  various  pieces  of  data  and  processes  related  to  uncertainty  are  distributed  throughout  the  various  levels  of  an 
object  hierarchy.  The  inheritance  and  encapsulation  of  this  knowledge  enables  conflation  in  a  distributed 
environment  to  take  place  transparently  to  the  user,  and  without  the  need  for  a  centralized  conflation  "manager.''  We 
believe  the  use  of  00  techniques  for  the  development  of  a  general  conflation  model  is  crucial  to  enabling  complex 
conflation  in  a  distributed  environment.  The  use  of  an  approach  such  as  this  has  far-reaching  implications  and  can 
tremendously  impact  progress  in  the  crucial  area  of  spatial  data  interoperability.  For  example,  implementation  of 
this  approach  could  enable  DoD  and  civilian  mapping  applications  to  seamlessly  utilize  NIMA,  USGS,  etc., 
holdings  effectively. 

Of  course,  further  refinement  of  the  model  is  needed  to  enable  the  handling  of  conflation  for  all  feature  types. 
Expansion  must  be  made  in  the  geometric  and  topological  matching  portions  of  the  model.  One  issue  related  to  the 
implementation  of  the  model  is  the  process  of  knowledge  acquisition  for  the  uncertainty-based  components. 
Interaction  with  cartographic  experts  in  the  field  of  conflation  is  a  necessary  step  in  providing  design  and 
implementation-level  details  for  the  rule-based  component  of  the  model. 

Another  issue  upon  which  we  will  be  expanding  is  that  of  deconfliction.  Whereas,  for  now  we  simply  select  the 
feature  with  the  best  overall  quality,  we  are  planning  for  the  development  of  a  process  that  will  allow  combinations 
of  the  best  parts  of  matching  features.  Optimistically,  this  will  provide  a  "super"  feature  that  is  better  in  all  or  most 
respects  than  any  of  the  single  matching  features.  FinaHy,  the  idea  of  a  parallel  conflation  algorithm  is  an  intriguing 
concept  for  future  work  within  a  distributed  environment.  Initial  implementation  of  the  distributed  conflation  model 
is  currently  underway  at  the  Naval  Research  Laboratory,  and  we  anticipate  reporting  on  preliminary  results  in  the 
near  future. 
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Abstract 

The  issue  of  spatial  querying  accuracy  has  been 
viewed  as  critical  to  the  successful  implementation  end 
long-term  viability  of  the  GIS  technology .  In  order  to 
improve  the  spatial  querying  accuracy  and  qualirv.  the 
problems  associated  with  the  areas  of  fuzziness  and 
uncertainty  are  of  great  concern  in  the  spatial  database 
community.  There  has  been  a  strong  demand  to  provide 
approaches  that  deal  with  inaccuracy  and  uncertainty 
in  GIS.  In  this  paper,  we  are  dedicated  to  develop  an 
approach  that  can  perform  fuzzy  spatial  querying  under 
uncertainty .  An  inexact  inferring  strategy'  is 

investigated.  The  study  shows  that  the  fuzzy  set  and  the 
certainty  factor  can  work  together  to  deal  with  spatial 
querying .  Querying  examples  implemented  by 
FuzzyClips  are  also  provided. 

Keywords:  uncertainty,  inexact  inferencing,  fuzzy 
inference ,  spatial  query ,  GIS,  FuzzyClips 

1.  Introduction 

Since  the  spatial  querying  deals  with  some  concepts 
expressed  by  verbal  language,  the  fuzziness  is 
frequently  involved.  Hence,  the  ability  to  query  a  spatial 
data  under  the  fuzziness  is  one  of  the  most  important 
characteristics  of  any  spatial  databases.  Some 
researchers  have  shown  that  the  directional  as  well  as 
topological  relationships  are  fuzzy  concepts  [1-2].  To 
support  queries  of  this  nature,  our  earlier  works  [3-6] 
provided  a  basis  for  fuzzy  querying  capabilities  based 
on  a  binary  model.  The  Clips-based  implementation  [6] 
shows  the  fuzzy  querying  can  distinguish  various  cases 
in  the  same  relation  classes.  For  instance,  consider  the 
example  relationship  Object  A  overlaps  Object  B.  The 
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fuzzy  querying  can  answer:  does  all  of  Object  A  overlap 
some  of  Object  B,  or  does  little  of  Object  A  overlap 
most  of  Object  B? 

However,  in  these  kinds  of  fuzzy  queries,  the 
representation  of  the  fuzzy  variables  is  based  on 
classical  set  theory.  Although  classical  sets  are  suitable 
for  various  applications  and  have  proven  to  be  an 
important  tool  for  mathematics  and  computer  science, 
they  do  not  reflect  the  nature  of  human  concepts  and 
thoughts,  which  tend  to  be  abstract  and  imprecise.  The 
flaw  comes  from  the  sharp  transition  between  inclusion 
and  exclusion  in  a  set.  In  this  paper  we  show  a  way  to 
use  the  fuzzy  set  for  dealing  with  the  vague  meaning  of 
linguistic  terms,  in  which  the  smooth  transition  is 
characterized  by  membership  function. 

The  queries  expressed  by  verbal  language  often 
involve  a  mixture  of  uncertainties  in  the  outcomes  that 
are  governed  by  the  meaning  of  linguistic  terms. 
Therefore,  there  is  an  availability-related  need  for 
skilled  inexact  inferring  approach  to  handle  the 
uncertain  feature  [7].  Uncertainty  occurs  when  one  is 
not  absolutely  certain  about  a  piece  of  information. 
Although  uncertainty  is  an  inevitable  problem  in  spatial 
queries,  there  are  clear  gaps  in  our  understanding  of 
how  to  incorporate  uncertain  reasoning  into  the  spatial 
querying  process.  This  requires  performing  an  inexact 
inferencing.  Recently,  models  of  uncertainty  have  been 
proposed  for  spatial  information  that  incorporate  ideas 
from  natural  language  processing,  the  value  of 
information  concept,  non-monotonic  logic  and  fuzzy 
set,  evidential  and  probability  theory.  Each  model  is 
appropriate  for  a  different  type  of  inexactness  in  spatial 
data.  By  incorporating  the  ftizzy  set  and  confirmation 
theory,  we  investigate  an  inexact  inferencing  approach 
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for  fuzzy  spatial  querying.  The  aim  is  to  improve  spatial 
querying  accuracy  and  quality. 

The  paper  is  organized  as  follows.  Section  2  briefly 
overviews  our  previous  works,  and  shows  some  basic 
techniques  and  strategies  to  deal  with  fuzzy  multiple 
relations  in  spatial  querying.  Section  3  describes  an 
approach  that  can  perform  fuzzy  querying  under  the 
uncertainties.  In  section  4,  FuzzyCLIPS  implementation 
shows  some  improved  querying  results. 

2.  An  Overview  of  Previous  Works 

Assume  that  the  spatial  objects  can  be  approximated 
by  their  minimum  bounding  rectangles  (MBR).  Figure  1 
shows  two  objects  in  two  dimensions.  Based  on  the 
spatial  binary  model  [3-6],  some  spatial  querying 
techniques  and  strategies  can  be  briefly  overviewed  as 
follows. 


I 

fllj 

ilpiP m 

IS 

lisa 

il 

^8BWBgl[ 

Figure  1.  Two  objects  in  2-0 


2.1  Basic  Spatial  Querying 

Topological  and  directional  relationships  are  critical 
components  in  the  retrieval  of  information  from  spatial 
databases,  including  image,  map  and  pictorial 
databases.  Many  contributions  have  been  made.  The 
authors  in  [10]  define  new  families  of  fuzzy  directional 
relations  in  terms  of  the  computation  of  force 
histograms,  which  is  based  on  the  raster  data.  In  this 
paper,  we  will  take  into  account  these  two  major  spatial 
relations  based  on  the  vector  data. 

The  topological  relationships  express  the  concepts  of 
inclusion  and  neighborhood  A  large  body  of  related 
work  has  focused  on  the  intersection  mode  that 
describes  relations  using  intersections  of  object's 
interiors  and  boundaries.  By  means  of  geometrical 
similarity,  we  defined  the  topological  relationships  as  a 
set: 

T={disjoint,  tangent,  surround  overlaps . }. 

The  paper  [3]  provides  greater  details  on  this. 

The  directional  relationships  are  commonly 
concerned  in  everyday  life.  Most  common  directions  are 
cardinal  direction  and  their  refinement  We  defined  the 
directional  relations  as  a  following  set: 

D={North,  East,  South,  West,  Northeast, 
Southeast,  Southwest,  Northwest}. 


Such  relationships  provided  a  significant  resource  for 
the  basic  binary  spatial  queries.  The  examples  of  such 
queries  might  look  like  these: 

Object  A  overlaps  Object  B. 

Object  A  is  west  of  Object  B. 

2.2  Fuzzy  Spatial  Querying 

Although  the  above  querying  method  can  provide 
topological  and  directional  information,  these  Itinds  of 
information  do  not  associated  with  any  degrees.  This 
means  it  can  only  perform  a  low  level  query.  A  typical 
example  is  shown  in  Figure  2. 


Figure  2. 


For  both  cases  that  belong  to  the  same  class  (or 
relation  group),  the  basic  spatial  querying  will  provide 
the  same  topological  and  directional  relationships,  i.e. 
Object  A  overlaps  Object  B  and  Object  A  is  west  of 
Object  B. 

How  to  provide  high  accurate  information,  such  as 
most  of  Object  A  overlaps  some  of  Object  B,  or  little  of 
Object  A  overbps  some  of  Object  B  and  so  on, 
encourages  us  to  make  the  further  investigation.  Some 
strategies  and  techniques  can  be  briefly  described  as 
follows  (see  the  details  in  [6]). 

•  Partition  each  object  into  sub-groups  in  eight 
directions  based  on  the  reference  area  (the  common 
part  of  two  objects)  shown  in  Figure  3; 

•  Map  each  sub-group  to  a  node,  and  assign  two 
weights  (area  and  node  weights)  to  each  node; 

•  Calculate  two  weighs  to  determine  the  special 
degree. 
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B3 
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Figure  3.  Partitioning  two  objects  in  2D 

Where  area  weight  can  be  calculated  by 

AW=(area  of  sub-group)  /  (area  of  the  entire  object) 
and  node  weight  can  be  obtained  by 

NW=AW  •  (axis  length)  /  (longest  axis  length). 
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In  order  to  support  fuzzy  querying,  the  resulting 
quantitative  figures  (AW,  NW)  are  mapped  to  a  range 
that  corresponds  to  a  term  known  as  linguistic 
qualifiers.  There  is  a  huge  body  of  knowledge  and 
techniques  that  deal  with  fuzzy  spatial  relations  in 
linguistic  expression.  In  this  paper,  we  define  the 
topological  qualifier  TQ  and  directional  qualifier  DQ 
as: 

TQ={all,  most,  some,  little,  none} 

DQ={directIy,  mostly,  somewhat,  slightly,  not}. 
As  mentioned  in  [9],  relative  qualifiers  can  be 
represented  as  fuzzy  subsets  of  the  unit  interval  and  use 
linguistic  word.  Based  on  the  classical  set.  the 
membership  function  of  qualifiers  can  be  defined  as  a 
binary  set,  that  is,  complete  membership  has  a  value  of 
1,  and  no  membership  has  a  value  of  0.  The  following 
tables  give  the  quantifying  description. 


Table  1.  Topological  Qualifiers 


Topological  Qualifiers 

(TQ) 

Area  Weight 
(AW) 

all 

0.96  to  1.00 

most 

0.60  to  0.95 

some 

0.30  to  0.59 

little 

0.06  to  0.29 

none 

0.00  to  0.05  j 

Table  2.  Directional  Qualifiers 


Directional  Qualifiers 
(DQ) 

Node  Weight  (NW) 

directly 

0.96  lo  1.00 

mostly 

0.60  to  0.95 

somewhat 

0.30  to  0.59 

slightly 

0.06  to  0.29 

not 

0.00  to  0.05 

As  shown  in  Figure  1,  the  based-Clips  implementation 
can  provide  the  following  information: 

Most  of  Object  A  overlaps  Object  B 
Object  A  overlaps  some  of  Object  B 


Most  of  Object  A  overlaps  some  of  Object  B 

Most  of  Object  A  is  west  of  Object  B 
Object  A  is  mostly  west  of  Object  B 


Most  of  Object  A  is  mostly  west  of  Object  B 

3.  Fuzzy  Querying  Under  Uncertainty 

Because  the  spatial  relationships  depend  on  human 
interpretation,  spatial  querying  should  be  related  by 
fuzzy  concepts.  To  support  queries  of  the  nature, 


previous  works  provided  fuzzy  queries  without 
uncertainty  that  can  handle  the  frizziness  by  defining 
fuzzy  qualifiers.  However,  in  these  kinds  of  fuzzy 
queries,  the  particular  grades  of  membership  have  been 
defined  as  classical  sets.  The  problem  is  there  exist  a 
gap  between  two  neighboring  members  such  as  'all'  and 
‘most’.  Because  a  jump  occurs,  no  qualifier  is  defined  in 
some  intervals,  for  example  the  interval  (0.95, 0.96).  To 
improve  the  fuzz}-  querying,  the  fuzzy  set  theory  is 
concerned  in  our  continuous  research. 


3.1.  Fuzziness  Consideration 


Fuzziness  occurs  when  the  boundary  of  a  piece  of 
information  is  not  clear-cut.  Hence,  fuzzy  querying 
expands  query  capabilities  by  allowing  for  ambiguity 
and  partial  membership.  The  definition  of  the  grades  of 
membership  is  subjective  and  depends  on  the  human 
interpretation.  A  way  to  eliminate  subjectivity  is  another 
interested  research  field.  Here  simple  membership 
functions  will  be  considered. 

A  fuzzy  set  is  a  set  without  a  crisp  boundary.  The 
smooth  transition  is  characterized  by  membership 
functions  that  give  fuzzy  sets  flexibility  in  linguistic 
expressions.  More  formally  a  fuzzy  set  in  a  universe  is 
characterized  by  a  membership  function  p:  U->[0,1], 
Figure  4  illustrates  the  primary  term  of  fuzzy  variable 
area  weight.  Each  term  represents  a  specific  fuzzy  set. 


membership 


Figure  4.  Membership  function  for  TQ 


The  fuzzy  set  functions  for  topological  qualifiers  can  be 
described  as: 

j-  1.0  if  0.95 SAW £1.0 

Udi  (aw)  =  \20  (aw  -  0.80  )/3  if  0.8S  aws  0.95 


IW(AW) 


-{2 


20  (0.95  -  aw)  13 
0 

10  (aw-  0.5) 


if  0.8  SAWS  0.95 
if  0.6  Saws  0.80 
if  0.5S  aws  0.6 


no  (0.6  -  aw) 

Piome  (AW  )=sj  1.0 

10  (aw  -  0.2 ) 


if  0.5  S  aw  S  0.6 
if  0.3  S  aws  0.5 
if  0.2  S  aws  0.3 


379 


r  10  (0.3  -  aw) 
tWAWW  l.o 

L  ioo(aw-o.oi) 


if  0.2  5  aw  5  0.3 
if  0.02  5  aw  5  0.2 
if  0.01  5  AW  5  0.02 


Pocne  (AW): 


-{ 


100  (0.02  -  aw) 

1.0 


if  0.0  15  aw  5  0.02 
if  0.0  5  aw  5  0.01 


In  the  same  way,  the  fuzzy  set  funcnons  for 
directional  querying  can  be  described  as: 

J”  1.0  if  0.95  5  NW  5  1.0 

Pdratiy  (NW )  =  1 20  (NW  -  0.80 )  /3  if  0.85  nws  0.95 

r  20  (0.95  -  NW)  /3  if  0.8  <NW  5  0.95 
pmottJy(NW)=  4  1.0  if  0.6  <NW5  0.80 

l  10  (nw-  0.5)  if  0.55  NW5  0.6 


f  10  (0.6  -  nw) 
pwmowhM  (NW  )—  j  1.0 

M0(nw-0.2) 


if  0.5  5  NW  5  0.6 
if  0.3  5NW5  0.5 
if  0.2  5  NW5  0.3 


P,UghUy(NW)=H 


10  (0.3  -  nw) 
1.0 

100(nw-0.01) 


if  0.2  5  NW  5  0.3 
if  0.02  5  NW  5  0.2 
if  0.01  5  NW  5  0.02 


Pno,(NW)= 


{ 


100  (0.02  -  nw) 
1.0 


if  0.0  15  NW  5  0.02 
if  0.0  5  NW  5  0.01 


Unlike  classical  set  theory  that  can  describe 
membership  to  a  set  dearly,  in  fuzzy  set  theory 
membership  of  a  term  to  a  set  is  partial,  i.e.,  a  term 
belongs  to  a  set  with  a  certain  grade  of  membership. 
Although  it  solves  the  gap  problem  in  classical  set 
expression,  a  new  problem  is  coming.  Because  a 
common  feature  of  the  fuzzy  sets  is  overlapping  the 
qualifiers  may  be  assoriated  with  two  different  terms  at 
the  intersect  intervals.  For  instance,  the  topological 
qualifier  TQ  may  take  'all1  and  'most'  at  the  same  time. 
This  reveals  uncertainty  -  the  lack  of  adequate  and 
correct  information  to  make  a  dedsion. 


How  do  we  make  the  decision  according  to  the 
information?  Which  querying  information  is  reliable? 

This  reveals  important  defidendes  in  areas  <mch  as 
the  reliability  of  queries  and  the  ability  to  defect 
inconsist enties  in  the  knowledge.  Because  we  cannot  be 
completely  certain  that  some  qualifiers  are  true  or  others 
are  false,  we  construct  a  certainty  factor  (CF)  to 
evaluate  the  degree  of  certainty.  The  degree  of  certainty 
is  usually  represented  by  a  crisp  numerical  value  one,  a 
scale  from  0  to  1.  A  certainty  factor  of  1  that  it 

is  very  certain  that  a  fact  is  true,  and  a  certainty  factor 
of  0  indicates  that  it  is  very  uncertain  that  a  fact  is  true. 
Some  key  ideas  relevant  to  the  determination  the  CF  are 
discussed  as  following. 

i  Considered  a  single  qualifier  for  each  querying 


This  is  a  case  in  which  only  on  qualifier  assodated 
with  a  single  object  is  involved  in  each  querying  result 
such  as: 

All  °f  Object  A  overlaps  Object  B 
Object  A  is  directly  west  of  Object  B. 

Where  the  fuzzy  topological  qualifier  TQ*  =‘all’  which 
is  assodated  with  the  object  A;  the  fuzzy  directional 
qualifier  DQa  =  ‘directly’  which  is  associated  with  the 
object  A 

•  If  the  qualifier  only  takes  one  term  at  given 
interval,  the  grade  of  membership  |i(  )  can  be  used 
as  a  CF  that  represents  the  degree  of  belief.  The 
results  will  look  like: 

All  of  Object  A  overlaps  Object  B 
with  CF=^,  (AWU  =0.99)  =1.0 


Object  A  is  directly  west  of  Object  B 
with  CF=(idilcU)r  (NWU  =0.99)  =1.0 
Where  AW„  is  the  area  weight  of  a  sub-group 
assodated  with  object  A;  NW„  is  the  node  weight  of  a 
subgroup  associated  with  object  A;  and  i,  jel[l,  8],  I 
represents  an  integer  set. 


3.2.  Uncertainty  Consideration 

Uncertainty  is  an  inevitable  problem  in  GIS.  In  this 
paper,  we  devote  ourselves  to  explore  an  approach  that 
can  perform  the  fuzzy  querying  under  uncertainties.  The 
study  exemplifies  whether  the  fuzzy  set  and  certainty 
factor  can  incoiporate  in  spatial  querying. 

Uncertainty  occurs  when  one  is  not  absolutely 
certain  about  a  piece  of  information  Given  AW=0.90, 
the  fuzzy  querying  may  give  the  following  querying 
phrase: 

All  of  Object  A  overlaps  Object  B 

Most  of  Object  A  overlaps  Object  B. 


•  If  the  given  weight  is  in  the  overlapping  area,  two 
qualifiers  will  be  related.  For  example,  the  fuzzv 
topological  qualifier  of  the  objed  A  takes  both  ’all' 
and  ‘most’.  The  querying  results  will  be: 

All  of  Object  A  overlaps  Object  B 
Most  of  Object  A  overlaps  Object  B 

It  is  acceptable  if  we  take  the  qualifier  that  has  a  larger 
grade  of  membership.  The  certainty  factor  can  be 
determined  by  the  maximum  value,  that  is, 

CF  =  max(n^(AW^  =0.90),^  ( A  =0.90)} 

=  P»u  ( AWw  =0.90). 
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The  final  querying  results  should  be 

All  of  Object  A  overlaps  Object  B 
with  CF=  j^a  (AWm  =0.90). 

As  a  result,  the  CF  in  case  1  can  be  obtained  by 


CF=max{|iTQfc(AW-j  =const),  kel[l,5],  i€l[l,8] } 
CF=max{^DQfe(NW.j  =const),k€l[l,5],  jel[l,8] } 
Where 

TQk  is  a  topological  qualifier  such  as  all; 

DQk  is  a  directional  qualifier  such  as  directly, 
AW.*  is  an  area  weight  associated  object  i-node 
NW.j  is  a  node  weight  associated  object  j-node 
•  is  used  to  represent  object  A  or  B 


Case  2.  Considered  multiple  qualifiers 

In  the  querying  results,  many  pieces  of  fuzzy  terms 
are  conjoined  (i.e.  they  are  joined  by  AND),  or 
disjoined  (i.e.  joined  by  OR).  The  examples  of  these 
types  of  queries  are  as  follows: 

Most  of  Object  A  overlaps  some  of  Object  B 

Some  of  Object  A  is  slightly  south  of  Object  B. 
Hence,  to  perform  these  kinds  of  queries,  we  have  to 
handle  multiple  fuzzy  qualifiers.  It  is  easy  to  understand 
that  the  relationship  between  different  object  qualifiers 
is  conjunction,  and  the  relationship  between  the  same 
object  qualifiers  is  disjoined  According  to  the  fuzzy  set 
theory,  the  conjunction  and  disjunction  of  fuzzy  term 
can  be  respectively  defined  as  the  minimum  and 
maximum  of  the  involved  facts.  Therefore,  the  certainty 
factor  contained  multiple  qualifiers  can  be  determined 
by  the  following  formulas: 


Consider  topological  relationships 
CF=min{  max{pTQk,(  AWU  =a) }, 
nttx{pTQtt)(  AWbj  =a)  }, 
where  ka,  kbel[l,5]  &  LJeI[l,8]  } 
Note:  a  topological  qualifier  TQi=all  if  ka-l. 

Consider  topological/directional  relationships 
CF=min{  max{pTQk»(  AWU  =a) }, 
maxtpDQ^fNW.j  =(3) }, 
where  kael[l,5]  &  ijel[l,8] } 
where  a* ,  bj  represent  object  node  associated 
with  object  A  and  B,  respectively, 
a  and  (3  are  constant. 


As  seen  above,  an  approach  in  which  the  fuzzy  set 
and  uncertainty  can  incorporate  to  perform  the  fuzzy 
queries  is  developed. 


4.  Fuzzy  CLIPS  Implementation 

FuzzyCLIPS  is  an  enhanced  version  of  CLIPS 
developed  at  the  National  Research  Council  of  Canada 
to  allow  the  implementation  of  fuzzy  expert  systems. 
The  modifications  made  to  CLIPS  contain  the  capability 
of  handling  fuzzy  concepts  and  reasoning.  It  allows  any 
mix  of  fuzzy  and  normal  terms,  numeric -comparison 
logic  controls,  and  uncertainties  in  the  rule  and  facts.  By 
using  FuzzyClips,  it  is  easy  for  us  to  deal  with  fuzziness 
in  approximate  reasoning,  to  manipulate  uncertainty  in 
the  rules  and  facts. 

In  the  process  of  our  implementation,  all  fuzzy 
variables  are  predefined  with  the  defiemplate  statement 
This  is  an  extension  of  the  standard  defiemplate 
construct  in  CLIPS.  For  example,  fuzzy  variables 
(qualifiers)  can  be  declared  in  defiemplate  constructs  as 
following: 


(defiemplate  TFVariable 

0  1;  define  the  fuzzy  variable  area- weight 
((all  (0.8  0.0)  (0.95  1.0)  (1.0  1.0)) 

(most  (0.5  0.0)  (0.6  1.0)  (0.8  1.0)  (0.95  0.0)) 
(some  (0.2  0.0)  (0.3  1.0)  (0.5  1.0)  (0.6  0.0)) 
(little(0.01  0.0)  (0.02  1.0)  (0.2  1.0)  (0.3  0.0)) 
(none  (0.0  1.0)  (0.01  1.0)  (0.02  0.0))  )  ) 

(defiemplate  DFVariable 

0  1;  define  the  fuzzy  variable  node-weight 
((directly  (0.8  0.0X0.95  1.0X1.00  1.0)) 

(mostly  (0.5  0.0X0.60  1.0X0.80  1.0X0.95  0.0)) 
(somewhat  (02  0.0X0.30  1.0X0.50  1.0X0.60  0.0)) 
(slightly  (0.01  0.0X0.02  1.0X0.20  1.0X0.30  0.0)) 
(not  (0.0  1.0X0.01  1.0X0.02  0.0))  )  ) 


A  number  of  commands  supplied  in  Fuuzy CLIPS  are 
very  helpful  for  user  to  access  fuzzy  components  that 
they  need.  In  our  application,  when  the  weights  (fuzzy 
variables)  are  calculated,  the  only  interested  information 
is  the  value  of  the  fuzzy  set  at  the  specified  weight 
value.  The  command  get-fs-value  provides  us  a  tool 
to  access  the  value.  The  syntax  of  the  command  is: 

(get-fs-value  ?<fact-variable>  <number> )  or 
(get-fs-value  <integer>  <number> )  or 
(get-fs-value  <fiizzy-value>  <number> ),  . 
where  <number>  is  a  value  that  must  lie  between  the 
lower  and  upper  bound  of  the  universe  of  discourse  for 
the  fuzzy  set  A  simple  example  just  look  like: 

j  (assert  (TFVariable  most) ) 

J  (defrule  Get-CF 
[  ?f  <  -  (TFVariable  ?cf) 

J  =>  (printout  t  "CF  for  “  ?cf  “  is  *  "  (get-fs-value  ?f  AW) 
j  crif)  (retract  ?f) 

!  ) 
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5.  Querying  Examples 


fium.  (  aw,o  =0. 1667)  =1.0  j_  min  =1.0 
lightly  ( NWto  =0. 1667)  =  1.0  J 


Given  two  objects  A(l,  1)  (7,  2)  and  B(2,  1X9,  4). 
The  previous  works  based  on  CLIPS  will  provide  the 
following  query  information. 

Query  results  of  binary 
spatial  relationships 

2D  physical  relations:  A  I  os  I  B. 

Topological  relations:  A  {overlaps}  B. 
Directional  relations:  A  {South}  B 

A  {South-West}  B 
A  {west  }  B 

Little  of  Object  A  is  West  of  Object  5 
Object  A  is  slightly  West  of  Object  B 


*>  Little  of  A  is  slightly  West  of  B. 

Based  FuzzvCLIPS,  the  querying  results  would  be  look 
like: 

Fuzzy  Query  results  with  certainty  factor 


Topological  information: 

83%  of  A  overlaps  23.8%  of  B 

Most  of  A  overlaps  some  cf  B  with  CF®T."~8 

Directional  information: 

Little  of  A  is  West  of  8  with  CF  -  l.C 
A  is  slightly  West  of  B  with  CF  -  l.C 


*3  Little  of  A  is  slightly  West  of  5 
with  CF-  1.0 

More  details  for  analysis  are  provided  as  following. 
Table  3  shows  part  of  quantitative  information  stored  in 
nodes  associate!  with  object. 


Table  3.  Quantitative  information 


Object  Name 

Node 

AW 

NW 

Object  A 

center 

west 

0.8333 

0.1667 

0.1667 

Object  B 

center 

north 

0.2380 

0.4762 

0.2655 

From  these  data,  we  know 
AW*  »  0.8333,  TQ  ->  {  all,  most}; 

AWa7  =  0.1667,  TQ-*  { little}; 

AWbo  =  0.4762,  TQ  {  some}; 

NW.7  =  0.1667,  DQ  {  slightly}. 

|i*ii  (  aw,o  =0.8333)  =  0.222  '|max=0.778'| 

^w(aw.o  =0.8333)  =  0.778  J  ltmn^.778 

AWw  =0.4762)  =  1.0  J 


6.  Conclusion 

In  a  real  world,  fuzziness  and  uncertainty  can  occur 
simultaneously.  To  improve  spatial  querying  accuracy, 
our  research  investigates  an  inexact  inferencing 
approach  that  can  perform  fuzzy  querying  under 
uncertainty.  The  reliability  of  querying  information  is 
judged  by  a  certainty  factor  (CF).  The  improved  fuzzy 
querying  is  very  flexible,  and  it  can  return  spatial 
information  in  a  wider  variety  of  forms. 
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